Member-only story
Multivariate outlier detection in Python
Six methods to be able to detect outliers/anomalies in your dataset
In my previous medium article I introduced five different methods for Univariate outlier detection: Distribution plot, Z-score, Boxplot, Tukey fences and clustering. This highlighted the fact that several different methods can be used to detect outliers in your data, but that each of these can lead to different conclusions. As such, in selecting which method to use you should pay attention to the context of the data and what domain knowledge would also suggest would be classed as an outlier.
Often however, data is collected from multiple sources, sensors and time periods creating multiple variables that could interact with your target variable. This means that analysis or machine learning methods are often applied in the cases where you have more than one variable to analyse. This means that it is often more crucial to be able to detect outliers as a result of the interaction between these variables rather than just detecting outliers from a single variable. This article therefore seeks to identify several different methods for this purpose.
As before, the Pokémon dataset is used to demonstrate these methods, with data from 801 Pokémon from 7 seasons. This will focus on the Attack and Defense attributes from this…