MAHALANOBIS DISTANCE AND ITS APPLICATION FOR DETECTING MULTIVARIATE OUTLIERS

Hamid Ghorbani

DOI Number
https://doi.org/10.22190/FUMI1903583G
First page
583
Last page
595

Abstract


While methods of detecting outliers is frequently implemented by statisticians when analyzing univariate data, identifying outliers in multivariate data pose challenges that univariate data do not. In this paper, after short reviewing some tools for univariate outliers detection, the Mahalanobis distance, as a famous multivariate statistical distances, and its ability to detect multivariate outliers are discussed. As an application the univariate and multivariate outliers of a real data set has been detected using R software environment for statistical computing.


Keywords

Mahalanobis distance, multivariate normal distribution, multivariate out- liers, outlier detection.

Full Text:

PDF

References


C. C. Aggarwal: Outlier Analysis, 2ed. Springer, 2017.

U. Balasooriya and Y. K. TSE: Outlier detection in linear models: A comparative study in simple linear regression. Communications in Statistics: Theory and Methods 15(12) 1986, 3589–3598.

V. Barnett and T. Lewis: Outliers in Statistical Data. John Wiley and Sons, Chichester, England, 1994.

C. Becker and U. Gather: The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94(447) (1999), 947–955.

V. Chandola, A. Banerjee and V. Kumar: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009), 1–58 .

W. Dai and M. G. Genton: Multivariate functional data visualization and outlier detection. Journal of Computational and Graphical Statistics 27(4) (2018), 923–934.

C. Fauconnier and G. Haesbroeck: Outliers detection with the minimum covariance determinant estimator in practice. Statistical Methodology 6(4) (2009), 363–379.

J. Fox and S. Weisberg: An R Companion to Applied Regression. Sage, 2011.

M. Goldstein and S. Uchida: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4) (2016), 1–31.

A. S. Hadi: Identifying multiple outliers in multivariate data. Journal of the Royal Statistical Society, Series B, 54 (1992), 761-771.

N. Haldar K. Farrukh A. Aftab and H. Abbas: Arrhythmia classification using Mahalanobis distance based improved fuzzy C-Means clustering for mobile health monitoring systems. Neurocomputing, 220 (2016), 221–235.

D. M. Hawkins: Identification of Outliers. Chapman and Hall, London, 1980.

D. C. Hoaglin, B. Iglewicz and J. W. Tukey: Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association 81 (1986), 991–999.

V. J. Hodge and J. Austin: A survey of outlier detection methodologies. Artif. Intell. Rev. 22 (2004), 85–126.

R. A. Johnson and D. Wichern: Applied Multivariate Statistical Analysis. Prentice Hall, 2007 .

I. T. Jolliffe: Principal Component Analysis. Springer-Verlag (1986).

W. J. Krzanowski: Principles of Multivariate Analysis: A User’s Perspective, Oxford Science Publications, 1988.

C. Leys, O. Klein, Y. Dominicy and C. Ley: Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology 74 (2018), 150–156.

P. C. Mahalanobis: On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta), 1936, 2, pp. 49–55.

J. Majewska: Identification of multivariate outliers problems and challenges of visualization methods. Informatyka i Ekonometria 4 (2015), 69–83.

G. M. Mimmack, S. Mason and J. Galpin: Choice of distance matrices in cluster analysis: defining regions. Journal of Climate 14 (2001), 2790–2797.

J. W. Osborne and A. Overbay: The power of outliers (and why researchers should always check for them). Pract. Assess. Res. Eval. 9(6) (2004), 1–9.

M. A. F. Pimentel, D. A. Clifton, L. Clifton and L. Tarassenko: A review of novelty detection. Signal Processing 99 (2014), 215-249.

D. M. Rocke and D. L. Woodruff: Identification of outliers in multivariate data. Journal of the American Statistical Association 91(435) (1996), 1047–1061.

P. J. Rousseeuw: Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications (W. Grossmann, G. Pflug, I. Vincze, W. Wertz, eds.), Reidel, Dordrecht, B, 1985, pp 283–297.

P. Rousseeuw, M. Debruyne, S. Engelen and M. Hubert: Robustness and outlier detection in Chemometrics. Critical Reviews in Analytical Chemistry 36(3), (2006), 221–242.

P. J. Rousseeuw and K. Van Driessen: A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41 (1999), 212–-223.

P. J. Rousseeuw and B. C. van Zomeren: Robust distances: simulation and cutoff Values. In: Directions in Robust Statistics and Diagnostics, Part II. (W. Stahel, S. Weisberg, eds.), Springer-Verlag, New York, 1991.

T. A. Sajesh and M. R. Srinivasan: An overview of multiple outliers in multidimensional data. Sri Lankan Journal of Applied Statistics 14 (2013), 86–120.

C. J. Santos-Pereira and A. M. Pires: Detection of outliers in multivariate data: a method based on clustering and robust estimators. In: Compstat (W. Härdle, B. Rönz, eds.), Physica, Heidelberg, 2002, pp 291–-296.

N. G. Sharma, M. Silarski, T. Bednarski, P. Białas, E. Czerwiński, A. Gajos, M. Gorgol, B. Jasińska, D. Kamińska, Ł. Kapłon, G. Korcyl, P. Kowalski, T. Kozik, W. Krzemień, E. Kubicz, S. Niedźwiecki, M. Pałka, L. Raczyński, Z. Rudy, O. Rundel, A. Słomski, A. Strzelecki, A. Wieczorek, W. Wiślicki, M. Zieliński, B. Zgardzińska and P. Moskal: Reconstruction of hit time and hit position of annihilation quanta in the J-PET detector using the Mahalanobis distance. Nukleonika 4 (2015), 765–769.

K. Singh and D. S. Upadhyaya: Outlier detection: Applications and techniques. International Journal of Computer Applications 89(6) (2014) 307–323.

S. Stöckl and M. Hanke: Financial applications of the Mahalanobis distance, SSRN Electronic Journal 1(2) (2014), 78–84.

V. Todorov and P. Filzmoser: An object-oriented framework for robust multivariate analysis. Journal of Statistical Software 32(3) (2009), 1–47.

J. W. Tukey: Exploratory Data Analysis. Addison-Wesley, New York, USA, 1977.

M. P. J. van der Loo: Distribution based outlier detection for univariate data. Discussion paper 10003 Statistics Netherlands (2010), 3–14.

M. P. J. van der Loo: Extremevalues, an R package for outlier detection in univariate data. R package version 2.3 (2010), url = http://www.github.com/markvanderloo/extremevalues.

G. M. Venturini: Statistical Distances and Probability Metrics for Multivariate Data. PhD Thesis, Charles III University of Madrid, 2015.

Y. Zhang, B. Du, L. Zhang and S. Wang: A low-rank and sparse matrix decomposition-based Mahalanobis distance method for hyperspectral anomaly detection. IEEE Transactions on Geoscience and Remote Sensing 220 (2016), 1376–1389.




DOI: https://doi.org/10.22190/FUMI1903583G

Refbacks

  • There are currently no refbacks.




© University of Niš | Created on November, 2013
ISSN 0352-9665 (Print)
ISSN 2406-047X (Online)