Skip to main content

course

Preprocessing for Machine Learning in Python

Intermediate

4.6+

Updated 12/2024

Learn how to clean and prepare your data for machine learning!

Start course for free

Included for FreePremium or Teams

PythonMachine Learning4 hours20 videos62 exercises4,700 XP51,924Statement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies

Course Description

This course covers the basics of how and when to perform data preprocessing. This essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. You'll learn how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, you'll have some practice preprocessing by getting a dataset on UFO sightings ready for modeling.

Prerequisites

Cleaning Data in Python Supervised Learning with scikit-learn

1

Introduction to Data Preprocessing

Introduction to preprocessing

Exploring missing data

Dropping missing data

Working with data types

Exploring data types

Converting a column type

Training and test sets

Class imbalance

Stratified sampling

2

Standardizing Data

Standardization

When to standardize

Modeling without normalizing

Log normalization

Checking the variance

Log normalization in Python

Scaling data for feature comparison

Scaling data - investigating columns

Scaling data - standardizing columns

Standardized data and modeling

KNN on non-scaled data

KNN on scaled data

3

Feature Engineering

Feature engineering

Feature engineering knowledge test

Identifying areas for feature engineering

Encoding categorical variables

Encoding categorical variables - binary

Encoding categorical variables - one-hot

Engineering numerical features

Aggregating numerical features

Extracting datetime components

Engineering text features

Extracting string patterns

Vectorizing text

Text classification using tf/idf vectors

4

Selecting Features for Modeling

Feature selection

When to use feature selection

Identifying areas for feature selection

Removing redundant features

Selecting relevant features

Checking for correlated features

Selecting features using text vectors

Exploring text vectors, part 1

Exploring text vectors, part 2

Training Naive Bayes with feature selection

Dimensionality reduction

Training a model with PCA

5

Putting It All Together

UFOs and preprocessing

Checking column types

Dropping missing data

Categorical variables and standardization

Extracting numbers from strings

Identifying features for standardization

Engineering new features

Encoding categorical variables

Features from dates

Text vectorization

Feature selection and modeling

Selecting the ideal dataset

Modeling the UFO dataset, part 1

Modeling the UFO dataset, part 2

Congratulations!

Preprocessing for Machine Learning in Python

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

Included withPremium or Teams

Don’t just take our word for it

*4.6

from 20 reviews

75%

20%

0%

5%

0%

Highest to Lowest
Lowest to Highest
Most recent
Top reviews

Orlando R.

5 days

Very well structured and concise. Good foundations on this topic.

Juan-Carlos V.

15 days

Better workflow explanation with a final example. The PCA is very simple but does not show how many components is considering and only gives the final value.

Gavin S.

20 days

Great

Noel C.

3 months

Five stars

Ankush B.

5 months

Excellent course for preprocessing data in Python before performing Machine Learning.

"Very well structured and concise. Good foundations on this topic."

Orlando R.

"Better workflow explanation with a final example. The PCA is very simple but does not show how many components is considering and only gives the final value."

Juan-Carlos V.

"Great"

Gavin S.

Join over 15 million learners and start Preprocessing for Machine Learning in Python today!

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.