Feature Engineering for Machine Learning in Python

Create new features to improve the performance of your Machine Learning models.

4 Stunden16 Videos53 Übungen31.338 LernendeLeistungsnachweis

Kostenloses Konto erstellen

oder

Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.

Trainierst du 2 oder mehr?

Versuchen DataCamp for Business

Beliebt bei Lernenden in Tausenden Unternehmen

Kursbeschreibung

Every day you read about the amazing breakthroughs in how the newest applications of machine learning are changing the world. Often this reporting glosses over the fact that a huge amount of data munging and feature engineering must be done before any of these fancy models can be used. In this course, you will learn how to do just that. You will work with Stack Overflow Developers survey, and historic US presidential inauguration addresses, to understand how best to preprocess and engineer features from categorical, continuous, and unstructured data. This course will give you hands-on experience on how to prepare any data for your own machine learning models.

Für Unternehmen

Trainierst du 2 oder mehr?

Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.

In den folgenden Tracks

Machine Learning Scientist mit Python

Gehe zu Track

1
Creating Features
Kostenlos
In this chapter, you will explore what feature engineering is and how to get started with applying it to real-world data. You will load, explore and visualize a survey response dataset, and in doing so you will learn about its underlying data types and why they have an influence on how you should engineer your features. Using the pandas package you will create new features from both categorical and continuous columns.
Kapitel Jetzt Abspielen
Why generate features?
50 xp
Getting to know your data
100 xp
Selecting specific data types
100 xp
Dealing with categorical features
50 xp
One-hot encoding and dummy variables
100 xp
Dealing with uncommon categories
100 xp
Numeric variables
50 xp
Binarizing columns
100 xp
Binning values
100 xp
2
Dealing with Messy Data
This chapter introduces you to the reality of messy and incomplete data. You will learn how to find where your data has missing values and explore multiple approaches on how to deal with them. You will also use string manipulation techniques to deal with unwanted characters in your dataset.
Kapitel Jetzt Abspielen
Why do missing values exist?
50 xp
How sparse is my data?
100 xp
Finding the missing values
100 xp
Dealing with missing values (I)
50 xp
Listwise deletion
100 xp
Replacing missing values with constants
100 xp
Dealing with missing values (II)
50 xp
Filling continuous missing values
100 xp
Imputing values in predictive models
50 xp
Dealing with other data issues
50 xp
Dealing with stray characters (I)
100 xp
Dealing with stray characters (II)
100 xp
Method chaining
100 xp
3
Conforming to Statistical Assumptions
In this chapter, you will focus on analyzing the underlying distribution of your data and whether it will impact your machine learning pipeline. You will learn how to deal with skewed data and situations where outliers may be negatively impacting your analysis.
Kapitel Jetzt Abspielen
Data distributions
50 xp
What does your data look like? (I)
100 xp
What does your data look like? (II)
100 xp
When don't you have to transform your data?
50 xp
Scaling and transformations
50 xp
Normalization
100 xp
Standardization
100 xp
Log transformation
100 xp
When can you use normalization?
50 xp
Removing outliers
50 xp
Percentage based outlier removal
100 xp
Statistical outlier removal
100 xp
Scaling and transforming new data
50 xp
Train and testing transformations (I)
100 xp
Train and testing transformations (II)
100 xp
4
Dealing with Text Data
Finally, in this chapter, you will work with unstructured text data, understanding ways in which you can engineer columnar features out of a text corpus. You will compare how different approaches may impact how much context is being extracted from a text, and how to balance the need for context, without too many features being created.
Kapitel Jetzt Abspielen
Encoding text
50 xp
Cleaning up your text
100 xp
High level text features
100 xp
Word counts
50 xp
Counting words (I)
100 xp
Counting words (II)
100 xp
Limiting your features
100 xp
Text to DataFrame
100 xp
Term frequency-inverse document frequency
50 xp
Tf-idf
100 xp
Inspecting Tf-idf values
100 xp
Transforming unseen data
100 xp
N-grams
50 xp
Using longer n-grams
100 xp
Finding the most common words
100 xp
Wrap-up
50 xp

Für Unternehmen

Trainierst du 2 oder mehr?

Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.

In den folgenden Tracks

Machine Learning Scientist mit Python

Gehe zu Track

Datensätze

Stack Overflow Survey Responses (Modified)US Presidential Inauguration Addresses

Mitwirkende

Sumedh Panchadhar

Hillary Green-Lerman

Voraussetzungen

Supervised Learning with scikit-learn

Robert O'Callaghan

Director of Data Science, Ordergroove

Was sagen andere Lernende?

Melden Sie sich an 15 Millionen Lernende und starten Sie Feature Engineering for Machine Learning in Python Heute!

Kostenloses Konto erstellen

Google LinkedIn Facebook

oder

Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.

Kursbeschreibung

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Trainierst du 2 oder mehr?

In den folgenden Tracks

Machine Learning Scientist mit Python

Creating Features

Dealing with Messy Data

Conforming to Statistical Assumptions

Dealing with Text Data

Trainierst du 2 oder mehr?

In den folgenden Tracks

Machine Learning Scientist mit Python

Was sagen andere Lernende?

Melden Sie sich an .css-ou6dz6{color:#03ef62;}15 Millionen Lernende und starten Sie Feature Engineering for Machine Learning in Python Heute!

Kostenloses Konto erstellen

Trainierst du 2 oder mehr?

Melden Sie sich an 15 Millionen Lernende und starten Sie Feature Engineering for Machine Learning in Python Heute!