Working with Categorical Data in Python
Learn how to manipulate and visualize categorical data using pandas and seaborn.
Start Course for Free4 hours15 videos52 exercises22,220 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Being able to understand, use, and summarize non-numerical data—such as a person’s blood type or marital status—is a vital component of being a data scientist. In this course, you’ll learn how to manipulate and visualize categorical data using pandas and seaborn. Through hands-on exercises, you’ll get to grips with pandas' categorical data type, including how to create, delete, and update categorical columns. You’ll also work with a wide range of datasets including the characteristics of adoptable dogs, Las Vegas trip reviews, and census data to develop your skills at working with categorical data.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
- 1
Introduction to Categorical Data
FreeAlmost every dataset contains categorical information—and often it’s an unexplored goldmine of information. In this chapter, you’ll learn how pandas handles categorical columns using the data type category. You’ll also discover how to group data by categories to unearth great summary statistics.
Course introduction50 xpCategorical vs. numerical100 xpExploring a target variable100 xpOrdinal categorical variables100 xpCategorical data in pandas50 xpSetting dtypes and saving memory100 xpCreating a categorical pandas Series100 xpSetting dtype when reading data100 xpGrouping data by category in pandas50 xpCreate lots of groups50 xpSetting up a .groupby() statement100 xpUsing pandas functions effectively100 xp - 2
Categorical pandas Series
Now it’s time to learn how to set, add, and remove categories from a Series. You’ll also explore how to update, rename, collapse, and reorder categories, before applying your new skills to clean and access other data within your DataFrame.
Setting category variables50 xpSetting categories100 xpAdding categories100 xpRemoving categories100 xpUpdating categories50 xpCollapsing categories knowledge check50 xpRenaming categories100 xpCollapsing categories100 xpReordering categories50 xpReordering categories in a Series100 xpUsing .groupby() after reordering100 xpCleaning and accessing data50 xpCleaning variables100 xpAccessing and filtering data100 xp - 3
Visualizing Categorical Data
In this chapter, you’ll use the seaborn Python library to create informative visualizations using categorical data—including categorical plots (cat-plot), box plots, bar plots, point plots, and count plots. You’ll then learn how to visualize categorical columns and split data across categorical columns to visualize summary statistics of numerical columns.
Introduction to categorical plots using Seaborn50 xpBoxplot understanding50 xpCreating a box plot100 xpSeaborn bar plots50 xpCreating a bar plot100 xpOrdering categories100 xpBar plot using hue100 xpPoint and count plots50 xpCreating a point plot100 xpCreating a count plot100 xpReview catplot() types100 xpAdditional catplot() options50 xpOne visualization per group100 xpUpdating categorical plots100 xp - 4
Pitfalls and Encoding
Lastly, you’ll learn how to overcome the common pitfalls of using categorical data. You’ll also grow your data encoding skills as you are introduced to label encoding and one-hot encoding—perfect for helping you prepare your data for use in machine learning algorithms.
Categorical pitfalls50 xpMemory usage knowledge check50 xpOvercoming pitfalls: string issues100 xpOvercoming pitfalls: using NumPy arrays100 xpLabel encoding50 xpCreate a label encoding and map100 xpUsing saved mappings100 xpCreating a Boolean encoding100 xpOne-hot encoding50 xpOne-hot knowledge check50 xpOne-hot encoding specific columns100 xpWrap-up video50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
collaborators
prerequisites
Data Manipulation with pandasKasey Jones
See MoreResearch Data Scientist
Join over 15 million learners and start Working with Categorical Data in Python today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.