A Data Mining Approach: Classification and Regression Trees (CART) for the Determinants of Earnings for Pakistan
Author: Neelam Younas

This study finds the determinants of earning for Pakistan using data mining technique simple multiple Regression and Classification tree and regression tree (CART) .For improving the accuracy of prediction advance techniques, bagging, random forest and boost, for regression and classification has been used. Labor force survey data (2012-13) is used in the study. Main Variables used as predictors in the study are education, Sex, Marital status, training, and occupation, location of working, training, experience, age etc. Monthly income is used as dependent variable. In case of classification income is divided in Quintiles, which is used as a dependent variable for classification variable. Type of industry, education, age and occupation are found useful predictors in both classification and regression tree. Results of regression shows that female earns less than male even if they are working in the same type of industry and those who are working in government, private or public sectors and have higher education they earn more than individuals working in other sectors. Government employee’s monthly average income is greater than private and public sector even they have same level of education. Results of classification tree shows that those individual who are working in government ,doing white collar job and age greater than 38 and education middle or above belong to Q4 and Q5 . While those individual who are working in other than government sector belongs to Q1.In Multiple regression all the predictors are useful except marital status. Supervisor: Dr. Zahid Asghar

Meta Data

Keywords : Classification and regression tree, Pakistan
Supervisor: Zahid Asghar

Related Thesis​