Skip to main content

course

Feature Engineering for Machine Learning in Python

Intermediate

Updated 12/2024

Create new features to improve the performance of your Machine Learning models.

Start course for free

Included for FreePremium or Teams

PythonMachine Learning4 hours16 videos53 exercises4,350 XP31,681Statement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies

Course Description

Every day you read about the amazing breakthroughs in how the newest applications of machine learning are changing the world. Often this reporting glosses over the fact that a huge amount of data munging and feature engineering must be done before any of these fancy models can be used. In this course, you will learn how to do just that. You will work with Stack Overflow Developers survey, and historic US presidential inauguration addresses, to understand how best to preprocess and engineer features from categorical, continuous, and unstructured data. This course will give you hands-on experience on how to prepare any data for your own machine learning models.

Prerequisites

Supervised Learning with scikit-learn

1

Creating Features

Why generate features?

Getting to know your data

Selecting specific data types

Dealing with categorical features

One-hot encoding and dummy variables

Dealing with uncommon categories

Numeric variables

Binarizing columns

Binning values

2

Dealing with Messy Data

Why do missing values exist?

How sparse is my data?

Finding the missing values

Dealing with missing values (I)

Listwise deletion

Replacing missing values with constants

Dealing with missing values (II)

Filling continuous missing values

Imputing values in predictive models

Dealing with other data issues

Dealing with stray characters (I)

Dealing with stray characters (II)

Method chaining

3

Conforming to Statistical Assumptions

Data distributions

What does your data look like? (I)

What does your data look like? (II)

When don't you have to transform your data?

Scaling and transformations

Normalization

Standardization

Log transformation

When can you use normalization?

Removing outliers

Percentage based outlier removal

Statistical outlier removal

Scaling and transforming new data

Train and testing transformations (I)

Train and testing transformations (II)

4

Dealing with Text Data

Encoding text

Cleaning up your text

High level text features

Word counts

Counting words (I)

Counting words (II)

Limiting your features

Text to DataFrame

Term frequency-inverse document frequency

Inspecting Tf-idf values

Transforming unseen data

Using longer n-grams

Finding the most common words

Feature Engineering for Machine Learning in Python

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

Included withPremium or Teams

Join over 15 million learners and start Feature Engineering for Machine Learning in Python today!

Create Your Free Account

Google LinkedIn Facebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.