Skip to main content
HomeData EngineeringCleaning an Orders Dataset with PySpark
project

Cleaning an Orders Dataset with PySpark

Step into a data engineer's shoes and master data cleaning with PySpark on an e-commerce orders dataset!

Start Project for Free
1 Task1,500 XP514

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
GroupTraining 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies


Project Description

Data cleaning is an essential skill for any data professional.

In this project, you will step into a role of a data engineer at an e-commerce company and use PySpark, a powerful tool for data processing, to clean an orders dataset.

This hands-on experience will sharpen your ability to format, extract and amend data for further analysis.

Project Tasks

  1. 1
    Task 1

Technologies

Python Spark

Topics

Data EngineeringData Preparation
Rufat Mustafaev HeadshotRufat Mustafaev

Data Scientist, Booking.com

Rufat is a data scientist at the global travel-tech leader. He has a background in Economics and has applied data science to solve complex problems in various industries including management consulting, credit risk, fintech and foodtech.
See More

What do other learners have to say?