Outlier Detection For Skewed Distribution: Bivariate Case
Most of real data contains observations that might not be in the conformity of the rest of the data set. These observations are known to be outliers and might be caused by the personal mistake/error or due to natural variation. It is important to detect the outliers in the data set as outliers might have positive or negative effect on the regression analysis, forecasting results and ANOVA etc. Outliers are influential tools to classify the most remarkable events of the world in cross sectional data and generally important events can be chosen by detecting outliers in time series data sets. Numerous outlier detection techniques have been discussed in the literature for the detection of outliers in univariate, bivariate and multivariate data set. Most of these techniques work well when the data is normal but they give misleading results for the skewed data. There are various techniques to detect outliers in skewed data for univariate case but when we have more than one variable, there are very limited techniques as we consider the case of multivariate skewed data. As, multivariate data has many practical uses in real life and to find the relationship between the sets of variables, it‟s important to detect outliers in the multivariate case. Adil (2011) proposed a technique namely SSSBB to detect the outlier in the univariate skewed data only and proved that SSSBB performance is better than the existing ones. In this study, we have extended SSSBB for the bivariate case and compared the result with the robust Mahalanobis distance technique considering various types of distributions. This study uses Monte Carlo Simulations for comparison purposes of SSSBB and Mahalanobis distance. The study considered the normal distribution, chi-square, gamma and beta distributions and different sample sizes are taken, to evaluate the performance of SSSBB for bivariate data and the study found that SSSBB performs well as compared to Mahalanobis distance, in all the xiii cases considered in the thesis. On the basis of ratio of outlier detected and the area of fence, the results show that SSSBB is a better method for normal as well as skewed data sets because SSSBB detects the possible outliers in the specified area of fence. Supervisor:- Dr. Atiq-ur-Rehman
Meta Data
Related Thesis
Visit Us
-
Monday to Friday:
8:00 am – 4:00 pm - Tel: +92-51-9248074, Fax: +92-51-9248065
- [email protected], [email protected]