Evaluating The Performance Of Variable Selection And Forecasting Methods Using Big Data
Author: Faridoon Khan


Statistical learning has two primary goals: ensuring high prediction accuracy and discovering relevant predictive variables. Variable selection is crucial when the representation of the true underlying model is sparse. Finding important predictors will improve the fitted model’s ability to forecast. Numerous methods for selecting variables are discussed in the literature, but different methods select a different subset of variables and also vary their performance under distinct circumstances. We can evaluate their relative performance by comparing them. This study compares Autometrics and machine learning techniques, including Minimax Concave Penalty (MCP), Elastic Smoothly Clipped Absolute Deviation (E-SCAD), and Adaptive Elastic Net (AEnet). For simulation experiments, three kinds of scenarios are considered by allowing multicollinearity, heteroscedasticity, and autocorrelation conditions with varying sample sizes and a varied number of covariates. First, we evaluate the performance under huge big data. In the presence of low and moderate cases of multicollinearity and autocorrelation, the considered methods retain all relevant variables, but MCP and E-SCAD over-specify the true data generating process (DGP). In the presence of extreme multicollinearity and Autocorrelation cases, the AEnet showed better performance comparatively. In case of heteroscedasticity, the AEnet specifies the true DGP very efficiently. Similarly, the forecasting performance of these methods, including factor models, is evaluated under the same conditions. The MCP produced more accurate forecasts than the rival methods, excluding a few cases where the proposed factor model and E-SCAD outperformed the competitors. While considering the fat big data, E-SCAD remains very effective in contrast to competing approaches in terms of variable selection. Under the forecasting exercises, the Autometrics remained quite successful. Complementing the simulation exercise, we have carried out an empirical application on a popular macroeconomic and financial dataset in Pakistan. The empirical results supported the results of the simulation experiments.

Meta Data

Supervisor: Amena Urooj
Cosupervisor: Saud Ahmed Khan

Related Thesis​