Skip to main content
HomeRCleaning Data in R

Cleaning Data in R

4.4+
25 reviews
Intermediate

Learn to clean data as quickly and accurately as possible to help your business move from raw data to awesome insights.

Start Course for Free
4 Hours13 Videos44 Exercises
46,907 LearnersTrophyStatement of Accomplishment

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Loved by learners at thousands of companies


Course Description

Overcome Common Data Problems Like Removing Duplicates in R

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. The time spent cleaning is vital since analyzing dirty data can lead you to draw inaccurate conclusions.

In this course, you’ll learn a variety of techniques to help you clean dirty data using R. You’ll start by converting data types, applying range constraints, and dealing with full and partial duplicates to avoid double-counting.

Delve into Advanced Data Challenges

Once you’ve practiced working on common data issues, you’ll move on to more advanced challenges such as ensuring consistency in measurements and dealing with missing data. After every new concept, you’ll have the chance to complete a hands-on exercise to cement your knowledge and build your experience.

Learn to Use Record Linkage During Data Cleaning

Record Linkage is used to merge datasets together when the values have issues such as typos or different spellings. You’ll explore this useful technique in the final chapter and practice the application by using it to join two restaurant review datasets together into a single dataset.
  1. 1

    Common Data Problems

    Free

    In this chapter, you'll learn how to overcome some of the most common dirty data problems. You'll convert data types, apply range constraints to remove future data points, and remove duplicated data points to avoid double-counting.

    Play Chapter Now
    Data type constraints
    50 xp
    Common data types
    100 xp
    Converting data types
    100 xp
    Trimming strings
    100 xp
    Range constraints
    50 xp
    Ride duration constraints
    100 xp
    Back to the future
    100 xp
    Uniqueness constraints
    50 xp
    Full duplicates
    100 xp
    Removing partial duplicates
    100 xp
    Aggregating partial duplicates
    100 xp
  2. 2

    Categorical and Text Data

    Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature. In this chapter, you’ll learn how to fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.

    Play Chapter Now
  3. 3

    Advanced Data Problems

    In this chapter, you’ll dive into more advanced data cleaning problems, such as ensuring that weights are all written in kilograms instead of pounds. You’ll also gain invaluable skills that will help you verify that values have been added correctly and that missing values don’t negatively impact your analyses.

    Play Chapter Now
  4. 4

    Record Linkage

    Record linkage is a powerful technique used to merge multiple datasets together, used when values have typos or different spellings. In this chapter, you'll learn how to link records by calculating the similarity between strings—you’ll then use your new skills to join two restaurant review datasets into one clean master dataset.

    Play Chapter Now

In the following tracks

Associate Data Scientist in RImporting & Cleaning Data with R

Collaborators

Collaborator's avatar
Richie Cotton
Collaborator's avatar
Adel Nehme
Collaborator's avatar
Amy Peterson
Maggie Matsui HeadshotMaggie Matsui

Curriculum Manager at DataCamp

Maggie is a Curriculum Manager at DataCamp. She holds a Bachelor's degree in Statistics and Computer Science from Brown University, where she spent lots of time teaching math, programming, and statistics as a tutor and teaching assistant. She's passionate about teaching all things data-related and making programming accessible to everyone.
See More

Don’t just take our word for it

*4.4
from 25 reviews
64%
24%
4%
4%
4%
Sort by
  • John G.
    8 months

    This course was great. It was informative with an excellent instructor who clearly explained the information.

  • Tara P.
    9 months

    There were some really nice ideas on here and it was very helpful. I think using the assertive package is not necessary though and would like to see this updated to more base functions and ideas.

  • Daniel M.
    10 months

    Great course, very useful content, I will retake it for sure. I wished there was a second part though.

  • Euler A.
    about 1 year

    Very good material. Working with string is excellent.

  • Nicolas F.
    about 1 year

    This course was succinct,simple,and effective. I learned a ton in a short period of time.

"This course was great. It was informative with an excellent instructor who clearly explained the information."

John G.

"There were some really nice ideas on here and it was very helpful. I think using the assertive package is not necessary though and would like to see this updated to more base functions and ideas."

Tara P.

"Great course, very useful content, I will retake it for sure. I wished there was a second part though."

Daniel M.

FAQs

Join over 13 million learners and start Cleaning Data in R today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.