Comparison of Adaptive Elastic-net and Elastic SCAD using High and Low Dimensional data and Application on Healthcare Expenditures for Pakistan
In many regressions we are always interested to find important explanatory factors in predicting the response. Covariates selection is important for knowledge discovery and the prediction performance of the fitted model. Picking up too many regressors increases the variance of the constructed model and taking fewer regressors results in unpredictable estimates. In practice, the importance of variable selection is massive in many disciplines and has been the focus of much research. The ordinary least squares (OLS) often performs poorly in both prediction and interpretation when number of independent variables are increased or variables have high correlation. In this study we try to model, investigate and compare the performance of two regularization techniques; Adaptive Elastic-net and Elastic SCAD using high dimensional data and lower dimensional data at different level of correlation and number of observations through simulations and empirical application on real life data of Healthcare expenditures over an extended time period for Pakistan. Adaptive Elastic-net and Elastic SCAD are updated and improved forms of ridge regression, LASSO, Adaptive LASSO, Elastic-net and SCAD. Mean squares error (MSE), number of false positive, number of false negative and F1 score (F measure) criterion are used for performance measure. According to simulation experiment Elastic SCAD perform well when correlation between independent variables lie around 50% or less at any number of observations otherwise Adaptive Elastic-net perform better then Elastic SCAD. It is also noted that Adaptive Elastic-net is effected by false positive variables and Elastic SCAD by number of false negative variables. It is also obtained that performance of Elastic SCAD can be improved by increasing number of observations. Increasing correlation performance of both covariate section techniques goes down while increasing number of observations reinforce their performance. Results of empirical analysis has shown similarities with simulation results. Determining model for per capita healthcare expenditure of Pakistan both methods have selected same variables that have significant relationship with per capita healthcare expenditures of Pakistan, due to very high correlation between variables Adaptive Elastic has performed well. Urban population, GDP per capita, net official development assistance and official aid v received, Health expenditure (public), population ages 65 and above and primary school enrollment have shown positive and significant relation while population, ages 14 and less has shown negative and significant correlation with per capita healthcare expenditures for Pakistan. The analysis reveals that Adaptive Elastic-net shows good prediction performance comparatively to Elastic SCAD when correlation among variables become very high otherwise Elastic SCAD is preferred. The analysis also reveals that when a model or study is sensitive to number of false positive variables Elastic SCAD is preferred and when a model or study is sensitive to number false negative variables then Adaptive Elastic-net is preferred to find relevant variables. Supervisor:- Dr. Amena Urooj
Meta Data
Related Thesis
Visit Us
-
Monday to Friday:
8:00 am – 4:00 pm - Tel: +92-51-9248074, Fax: +92-51-9248065
- [email protected], [email protected]