The College of Administration and Economics at the University of Baghdad discussed, a PhD dissertation in field of Statistics by the student ( Noor Nawzat Ahmed ) and tagged with (Using smoothing methods in cluster analysis of longitudinal data with a practical application) , Under supervision of (Prof. Dr. Suhail Najm Abdullah)
Longitudinal data have become increasingly popular, especially in medical and economic fields, and various methods for analyzing this type of data have been analyzed and developed. The focus of this research was on collecting and analyzing this type of data, as cluster analysis plays an important role in identifying and assembling related profiles over time. The interest in this thesis was in the non-parametric cubic B-spline model, which is characterized by the continuity of the first and second derivatives in each node, resulting in a smoother, more flexible curve that is able to capture more complex patterns and fluctuations in the data. The penalization clustering method was used to group the longitudinally balanced data file into subgroups by penalizing the pairwise distances between the coefficients of the cubic B-spline model using one of the penalization functions. Hence, clustering was proposed using a penalty function that was invented recently, which is the concave penalization function. The cubic spline penalty and used in the pair distance penalty using the nonparametric pairwise grouping NPG method. This method, in turn, determines the number of clusters through one of the model selection criteria, which is the Bayesian Information Criteria (BIC), and we use optimization methods to solve its equations. Therefore, we applied the alternative direction method of the ADMM multiplier algorithm to reach approximate solutions to find the estimators of the non-parametric model using the R statistical program. We also use another penalty function that is commonly used, which is the minimax concave penalty (MCP), and use the same method in clustering. Then perform the application of the clustering method using the two functions through a simulation study and comparison and contrast between them. The clustering method was also used using the K-means algorithm, and its results were compared with the previously mentioned method. In the simulation study, balanced longitudinal data were generated for 60 and 100 subjects, and the number of replicates (times) was 10 for each subject. The simulation was repeated for 100 iterations. The results of the experiments showed that using partial methods using the CSP penalty function has proven successful in the clustering process. Compared to the MCP penalty function, it is more efficient, despite the fact that the differences are very small between them. This indicates that the method has improved clustering. Penal data for longitudinal data using the cubic spline model R software was used to perform these operations. As for the practical aspect, data was analyzed for patients with kidney failure, which were obtained by collecting medical examinations related to the disease in question from the Ibn Sina Teaching Hospital for Dialysis in Mosul for a period of 7 consecutive months during the year 2023 and by applying the NPG aggregation method and using two penalty functions. In CSP and MCP, the results showed that the tests were grouped into two clusters, and based on the clarification provided by one of the hospital’s employees in the memorandum, it was found that the glomerular filtration function of the kidneys reaches certain standards that determine whether the patient needs dialysis, either twice a week or three times. May God grant us good health.