The College of Administration and Economics at the University of Baghdad discussed , a master’s thesis in field of Statistics by the student (Hayder Osman Hussein ) and tagged with (Comparison Between Partial Least Squares And Principal Components Methods Based On The First Component With Application) , Under supervision of (Assist.Prof Dr. Rabab Abdul- Rida Saleh )
Regression analysis is considered one of the important topics in statistics and various other sciences, which requires data analysis to understand the relationships between the explanatory variable or several explanatory variables and the response variable, which is used to predict future values and determine the most important explanatory variables that affect the dependent variable and form regression models with high interpretive and predictive capacity between the explanatory variables and the dependent variable. Due to the presence of some problems in the explanatory variables, including the problem of multicollinearity, which causes difficulty in estimating the parameters and determining the effect of each explanatory variable on the response variable and increasing the variance of the parameters, it becomes difficult to interpret the parameters accurately and thus gives unreliable statistical indicators. To address this problem, several methods can be used, including (the partial least squares method and the principal components method), which were relied upon in this research with different algorithms for the partial least squares method, namely (SIMPLS, O-PLS) and the (NIPLAS, SVD) algorithm for the principal components method. In order to obtain high accuracy between the regression coefficients and high interpretability, the genetic algorithm was used in both methods, and a comparison was made between the methods before and after using the genetic algorithm. In the experimental aspect, in order to choose the best method, several models and different sample sizes were assumed, using the comparison criterion of the mean square error (MSE), and all simulation results indicated that the partial least squares (PLS) method and the O-PLS algorithm before and after using the genetic algorithm were the best because they achieved the lowest value of MSE, and therefore we can use this method to estimate the parameters of the regression model. In the applied aspect, the comparison was made on real data representing patients with high blood pressure, using the comparison criterion (MSE), and the results indicated that the partial least squares method with the O-PLS algorithm is the best before and after using the genetic algorithm.