The College of Administration and Economics at the University of Baghdad discussed a master’s thesis in field of Statistics by the student (Mustafa Habib Mahdi ) and tagged with (Using Some Robust Linear regression estimators to estimate the Mean in stratified random sampling with application ) , Under supervision of ( Prof. Dr. Saja Mohammed Hussein)
The use of the stratified random sampling design, in sampling the appropriate communities for its application, plays an important role in obtaining estimators with high efficiency compared to other sampling designs.
Estimating the population mean is a fundamental problem in statistical inference. When it comes to sampling from a population group that can be divided into population subgroups, or strata, stratified random sampling is a commonly used technique to improve the efficiency of the estimator.
In stratified random sampling, the population is first divided into non-overlapping strata, then a simple random sample is taken from each stratum. The goal is to obtain a sample that is representative of the entire population while ensuring that each stratum is well represented .
There are several methods available for estimating the population mean in stratified random sampling, including separate linear regression estimators and combined linear regression estimators .
The separate linear regression estimator involves estimating the population mean separately for each stratum and then taking the weighted mean for the stratum, where the weights are proportional to the stratum sizes. On the other hand, the joint linear regression estimator includes modeling the relationship between the response variable and the class indicator variables using the linear regression model, and then estimating the population mean using the appropriate model .
In this thesis, separate regression estimates were presented to estimate the population mean in stratified random sampling. Estimates of the population mean for separate regression using the robust methods were compared with the estimates of the traditional separate regression through the comparison standard MSE. In addition, these estimates were compared with the conventional estimates using Efficiency Standard (RE) .
The OLS method estimator is efficient because it takes into account covariance values of errors, but it is also very sensitive to outliers .
We found that robust estimates significantly improved the quality of separate regression estimates to estimate the population mean of stratified random sampling by reducing the effect of outliers using estimates of robust regression methods (Huber MM, LMS, Huber M, LAD) and robust covariance and covariance matrices (MCD, MVE). In addition, the results gave that the Huber M method and Huber MM method were the best in dealing with outliers in the data set, as they have less (MSE) and larger (RE) values .
Also, estimates of combined regression were used to estimate the average of the population in the stratified random sampling, and the estimates of combined regression were compared by employing the estimators of the robust covariance and covariance matrix with estimates of the combined regression by employing the estimates of the traditional covariance and covariance matrix when estimating the regression parameter, through the two criteria of efficiency (RE) , and mean squared error (MSE) .
We found that robust estimates led to a significant improvement in the quality of combined regression estimates by reducing the effect of outliers by using estimates of robust covariance and covariance matrices (MCD, MVE) when estimating the regression parameter. MVE handles outliers in the data set, as it has lower values (MSE) .
In order to achieve the objectives of the research, it was divided into four chapters, the first chapter included the introduction, the objective of the research, and the reference review, while the second chapter dealt with the theoretical side of the robust methods in estimating the community average and some important concepts of stratified random sampling, while the third chapter dealt with the experimental and applied side, as two types of studies were conducted. In the first studies, the simulation method was used to compare the studied estimation methods in estimating the community average and to identify the best estimator based on the two statistical measures of efficiency (RE) and the mean squared error, which gives the least (MSE) for the community average, and in the second, real data were used to verify the performance of the robust methods. In practice .
Finally, the fourth chapter dealt with the conclusions and recommendations reached by the researcher, and in general it was concluded that the method (Huber MM) was the best in dealing with outliers in the data set, and the estimation (MCD) is the best in estimating the average of the population compared to the studied estimation methods .