Skip to main content
HomePython
skill track

Big Data with PySpark

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

PythonClock25hrsLearn6 coursesApply1 projectTrophyStatement of Accomplishment

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
Group

Training 2 or more people?

Try DataCamp for Business

Loved by learners at thousands of companies


1
Introduction to PySpark

Learn to implement distributed data management and machine learning in Spark using the PySpark package.

4 hours

Lore Dirick Headshot

Lore Dirick

Director of Data Science Education at Flatiron School

Sparkles AI ASSISTANTSign up to use the AI AssistantOur AI assistant is free to use for all registered users. Sign up or login to access the assistant and boost your learning experience.
Discover
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.
DataCamp for BusinessFor a bespoke solution book a demo.

Instructors

  • Lore Dirick Headshot
    Lore DirickDirector of Data Science Education at Flatiron SchoolSee Lore Dirick's Portfolio
  • Nick Solomon Headshot
  • Upendra Kumar Devisetty Headshot
  • Mike Metzger Headshot

FAQs

Join over 15,140,000 learners and start Big Data with PySpark today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.