Skip to main content
HomeSpark

course

Cleaning Data with PySpark

Advanced
4.1+
19 reviews
Updated 12/2024
Learn how to clean data with Apache Spark in Python.
Start course for free

Included for FreePremium or Teams

SparkData Preparation4 hours16 videos53 exercises4,150 XP27,670Statement of Accomplishment

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
Group

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies

Course Description

Working with data is tricky - working with millions or even billions of rows is worse. Did you receive some data processing code written on a laptop with fairly pristine data? Chances are you’ve probably been put in charge of moving a basic data process from prototype to production. You may have worked with real world datasets, with missing fields, bizarre formatting, and orders of magnitude more data. Even if this is all new to you, this course helps you learn what’s needed to prepare data processes using Python with Apache Spark. You’ll learn terminology, methods, and some best practices to create a performant, maintainable, and understandable data processing platform.

Prerequisites

Intermediate PythonIntroduction to PySpark
1

DataFrame details

Start Chapter
2

Manipulating DataFrames in the real world

Start Chapter
3

Improving Performance

Start Chapter
4

Complex processing and data pipelines

Start Chapter
Cleaning Data with PySpark
Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

Included withPremium or Teams

Enroll now

Don’t just take our word for it

*4.1
from 19 reviews
53%
21%
16%
11%
0%
  • Flor S.
    about 1 month

    Best part for me is the interactive part where you get to apply immediately what was taught in the course through virtual coding.

  • Syed O.
    8 months

    I did learn alot from the course and it definitely talked about many pyspark features not mentioned in other courses however more explaination with examples for tougher and complicated topics in the course would have been better

  • André S.
    9 months

    Eu aprendi demais com esse curso. Gostei muito dos laboratórios também.

  • Douglas L.
    over 1 year

    Very Good Content.

  • Jegan D.
    over 1 year

    Very good course with challenging examples. The only problem is that I found it difficult to submit some of my answers or the solution provided. This happened in two different exercises.

"Best part for me is the interactive part where you get to apply immediately what was taught in the course through virtual coding."

Flor S.

"I did learn alot from the course and it definitely talked about many pyspark features not mentioned in other courses however more explaination with examples for tougher and complicated topics in the course would have been better"

Syed O.

"Eu aprendi demais com esse curso. Gostei muito dos laboratórios também."

André S.

Join over 15 million learners and start Cleaning Data with PySpark today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.