Skip to main content
HomePython

Preprocessing for Machine Learning in Python

4.6+
19 reviews
Intermediate

Learn how to clean and prepare your data for machine learning!

Start Course for Free
4 hours20 videos62 exercises51,501 learnersTrophyStatement of Accomplishment

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
Group

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies


Course Description

This course covers the basics of how and when to perform data preprocessing. This essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. You'll learn how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, you'll have some practice preprocessing by getting a dataset on UFO sightings ready for modeling.
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.
DataCamp for BusinessFor a bespoke solution book a demo.

In the following Tracks

Certification Available

Data Scientist in Python

Go To Track

Machine Learning Scientist in Python

Go To Track
  1. 1

    Introduction to Data Preprocessing

    Free

    In this chapter you'll learn exactly what it means to preprocess data. You'll take the first steps in any preprocessing journey, including exploring data types and dealing with missing data.

    Play Chapter Now
    Introduction to preprocessing
    50 xp
    Exploring missing data
    50 xp
    Dropping missing data
    100 xp
    Working with data types
    50 xp
    Exploring data types
    50 xp
    Converting a column type
    100 xp
    Training and test sets
    50 xp
    Class imbalance
    50 xp
    Stratified sampling
    100 xp
  2. 2

    Standardizing Data

    This chapter is all about standardizing data. Often a model will make some assumptions about the distribution or scale of your features. Standardization is a way to make your data fit these assumptions and improve the algorithm's performance.

    Play Chapter Now
  3. 4

    Selecting Features for Modeling

    This chapter goes over a few different techniques for selecting the most important features from your dataset. You'll learn how to drop redundant features, work with text vectors, and reduce the number of features in your dataset using principal component analysis (PCA).

    Play Chapter Now
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

In the following Tracks

Certification Available

Data Scientist in Python

Go To Track

Machine Learning Scientist in Python

Go To Track

datasets

Hiking dataWine dataUFO sightings dataVolunteering data

collaborators

Collaborator's avatar
Nick Solomon
Collaborator's avatar
Kara Woo
James Chapman HeadshotJames Chapman

Data Science & AI Curriculum Manager, DataCamp

James is a Curriculum Manager at DataCamp, where he collaborates with experts from industry and academia to create courses on AI, data science, and analytics. He has led nine DataCamp courses on diverse topics in Python, R, AI developer tooling, and Google Sheets. He has a Master's degree in Physics and Astronomy from Durham University, where he specialized in high-redshift quasar detection. In his spare time, he enjoys restoring retro toys and electronics.

Follow James on LinkedIn
See More

Don’t just take our word for it

*4.6
from 19 reviews
74%
21%
0%
5%
0%
  • Juan-Carlos V.
    2 days

    Better workflow explanation with a final example. The PCA is very simple but does not show how many components is considering and only gives the final value.

  • Gavin S.
    6 days

    Great

  • Noel C.
    3 months

    Five stars

  • Ankush B.
    5 months

    Excellent course for preprocessing data in Python before performing Machine Learning.

  • Samuel H.
    7 months

    Excellent review of the fundamentals.

"Better workflow explanation with a final example. The PCA is very simple but does not show how many components is considering and only gives the final value."

Juan-Carlos V.

"Great"

Gavin S.

"Five stars"

Noel C.

Join over 15 million learners and start Preprocessing for Machine Learning in Python today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.