Anomaly Detection in Python
Detect anomalies in your data analysis and expand your Python statistical toolkit in this four-hour course.
Start Course for Free4 hours16 videos59 exercises4,303 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Spot Anomalies in Your Data Analysis
Extreme values or anomalies are present in almost any dataset, and it is critical to detect and deal with them before continuing statistical exploration. When left untouched, anomalies can easily disrupt your analyses and skew the performance of machine learning models.
Learn to Use Estimators Like Isolation Forest and Local Outlier Factor
In this course, you'll leverage Python to implement a variety of anomaly detection methods. You'll spot extreme values visually and use tested statistical techniques like Median Absolute Deviation for univariate datasets. For multivariate data, you'll learn to use estimators such as Isolation Forest, k-Nearest-Neighbors, and Local Outlier Factor. You'll also learn how to ensemble multiple outlier classifiers into a low-risk final estimator. You'll walk away with an essential data science tool in your belt: anomaly detection with Python.
Expand Your Python Statistical Toolkit
Better anomaly detection means better understanding of your data, and particularly, better root cause analysis and communication around system behavior. Adding this skill to your existing Python repertoire will help you with data cleaning, fraud detection, and identifying system disturbances.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Detecting Univariate Outliers
FreeThis chapter covers techniques to detect outliers in 1-dimensional data using histograms, scatterplots, box plots, z-scores, and modified z-scores.
What are anomalies and outliers?50 xpPrint a 5-number summary100 xpHistograms for outlier detection100 xpScatterplots for outlier detection100 xpBox plots and IQR50 xpBoxplots for outlier detection100 xpCalculating outlier limits with IQR100 xpUsing outlier limits for filtering100 xpUsing z-scores for Anomaly Detection50 xpFinding outliers with z-scores100 xpUsing modified z-scores with PyOD100 xp - 2
Isolation Forests with PyOD
In this chapter, you’ll learn the ins and outs of how the Isolation Forest algorithm works. Explore how Isolation Trees are built, the essential parameters of PyOD's IForest and how to tune them, and how to interpret the output of IForest using outlier probability scores.
Getting started with Isolation Forests50 xpThe difference between univariate and multivariate anomalies50 xpDetecting outliers with IForest100 xpOverview of Isolation Forest hyperparameters50 xpMost important IForest parameters50 xpChoosing contamination100 xpChoosing n_estimators100 xpChecking the theory50 xpHyperparameter tuning of Isolation Forest50 xpTuning contamination100 xpTuning multiple hyperparameters100 xpInterpreting the output of IForest50 xpAlternative way of classifying with IForest100 xpUsing outlier probabilities100 xp - 3
Distance and Density-based Algorithms
After a tree-based outlier classifier, you will explore a class of distance and density-based detectors. KNN and Local Outlier Factor classifiers have been proven highly effective in this area, and you will learn how to use them.
KNN for outlier detection50 xpKNN for the first time100 xpKNN with outlier probabilities100 xpOutlier-robust feature scaling50 xpFinding the euclidean distance manually100 xpFinding the euclidean distance with SciPy100 xpPracticing standardization100 xpTesting QuantileTransformer100 xpHyperparameters of KNN50 xpDifferentiating distance metrics100 xpCalculating manhattan distance manually100 xpTuning n_neighbors100 xpTuning the aggregation method100 xpLocal Outlier Factor50 xpLOF for the first time100 xpLOF with outlier probabilities100 xp - 4
Time Series Anomaly Detection and Outlier Ensembles
In this chapter, you’ll learn how to perform anomaly detection on time series datasets and make your predictions more stable and trustworthy using outlier ensembles.
Introduction to time series50 xpWorking with DateTime columns100 xpCreating a DateTimeIndex100 xpMAD on time series100 xpIsolation Forest on time series100 xpTime Series Decomposition for Outlier Detection50 xpPracticing decomposition100 xpFitting on residuals100 xpOutlier classifier ensembles50 xpScaling parts of a dataset100 xpManual outlier ensembles - creating the arrays100 xpStoring outlier probabilities100 xpAggregating and thresholding the probabilities100 xpHow to deal with identified outliers50 xpClassifying the reasons for outlier presence100 xpWhen to drop outliers100 xpNon-aggressive methods of dealing with outliers100 xpCongratulations!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.collaborators
prerequisites
Supervised Learning with scikit-learnBex Tuychiyev
See MoreKaggle Master, Data Science Content Creator
Bex is a Top 10 AI writer on Medium and a Kaggle Master with over 10k followers. He loves writing detailed guides, tutorials, and notebooks on complex data science and machine learning topics with a bit of a sarcastic style.
What do other learners have to say?
Join over 15 million learners and start Anomaly Detection in Python today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.