Skip to main content

course

Dimensionality Reduction in Python

Intermediate

4.5+

Updated 12/2024

Understand the concept of reducing dimensionality in your data, and master the techniques to do so in Python.

Start course for free

Included for FreePremium or Teams

PythonMachine Learning4 hours16 videos58 exercises4,700 XP30,870Statement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies

Course Description

High-dimensional datasets can be overwhelming and leave you not knowing where to start. Typically, you’d visually explore a new dataset first, but when you have too many dimensions the classical approaches will seem insufficient. Fortunately, there are visualization techniques designed specifically for high dimensional data and you’ll be introduced to these in this course. After exploring the data, you’ll often find that many features hold little information because they don’t show any variance or because they are duplicates of other features. You’ll learn how to detect these features and drop them from the dataset so that you can focus on the informative ones. In a next step, you might want to build a model on these features, and it may turn out that some don’t have any effect on the thing you’re trying to predict. You’ll learn how to detect and drop these irrelevant features too, in order to reduce dimensionality and thus complexity. Finally, you’ll learn how feature extraction techniques can reduce dimensionality for you through the calculation of uncorrelated principal components.

Prerequisites

Supervised Learning with scikit-learn

1

Exploring High Dimensional Data

Introduction

Finding the number of dimensions in a dataset

Removing features without variance

Feature selection vs. feature extraction

Visually detecting redundant features

Advantage of feature selection

t-SNE visualization of high-dimensional data

t-SNE intuition

Fitting t-SNE to the ANSUR data

t-SNE visualisation of dimensionality

2

Feature Selection I - Selecting for Feature Information

The curse of dimensionality

Train - test split

Fitting and testing the model

Accuracy after dimensionality reduction

Features with missing values or little variance

Finding a good variance threshold

Features with low variance

Removing features with many missing values

Pairwise correlation

Correlation intuition

Inspecting the correlation matrix

Visualizing the correlation matrix

Removing highly correlated features

Filtering out highly correlated features

Nuclear energy and pool drownings

3

Feature Selection II - Selecting for Model Accuracy

Selecting features for model performance

Building a diabetes classifier

Manual Recursive Feature Elimination

Automatic Recursive Feature Elimination

Tree-based feature selection

Building a random forest model

Random forest for feature selection

Recursive Feature Elimination with random forests

Regularized linear regression

Creating a LASSO regressor

Lasso model results

Adjusting the regularization strength

Combining feature selectors

Creating a LassoCV regressor

Ensemble models for extra votes

Combining 3 feature selectors

4

Feature Extraction

Feature extraction

Manual feature extraction I

Manual feature extraction II

Principal component intuition

Principal component analysis

Calculating Principal Components

PCA on a larger dataset

PCA explained variance

PCA applications

Understanding the components

PCA for feature exploration

PCA in a model pipeline

Principal Component selection

Selecting the proportion of variance to keep

Choosing the number of components

PCA for image compression

Congratulations!

Dimensionality Reduction in Python

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

Included withPremium or Teams

Don’t just take our word for it

*4.5

from 12 reviews

75%

0%

25%

0%

0%

Highest to Lowest
Lowest to Highest
Most recent
Top reviews

Orlando R.

19 days

It really was a great course. The instructor was very savvy and the explanations were on point. I learned A LOT!

Freddy C.

10 months

It was a great course, I would use part of the content during my data analytics classes.

Bryce Y.

about 1 year

Very practical course with the right balance of breadth vs detail

HARPREET S.

over 1 year

concepts delivered

Ankush B.

over 1 year

Topics are very well explained in the course.

"It really was a great course. The instructor was very savvy and the explanations were on point. I learned A LOT!"

Orlando R.

"It was a great course, I would use part of the content during my data analytics classes."

Freddy C.

"Very practical course with the right balance of breadth vs detail"

Bryce Y.

FAQs

Join over 15 million learners and start Dimensionality Reduction in Python today!

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.