Parallel Programming with Dask in Python
Learn how to use Python parallel programming with Dask to upscale your workflows and efficiently handle big data.
Start Course for Free4 hours15 videos51 exercises3,831 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Use Parallel Processing to Speed Up Your Python Code
With this 4-hour course, you’ll discover how parallel processing with Dask in Python can make your workflows faster.When working with big data, you’ll face two common obstacles: using too much memory and long runtimes. The Dask library can lower your memory use by loading chunks of data only when needed. It can lower runtimes by using all your available computing cores in parallel. Best of all, it requires very few changes to your existing Python code.
Analyze Big Structured Data Using Dask DataFrames
In this course, you use Dask to analyze Spotify song data, process images of sign language gestures, calculate trends in weather data, analyze audio recordings, and train machine learning models on big data.You’ll start by learning the basics of Dask, exploring how parallel processing in Python can speed up almost any code. Next, you’ll explore Dask DataFrames and arrays and how to use them to analyze big structured data.
Train machine learning models using Dask-ML
As you progress through the 51 exercises in this course, you’ll learn how to process any type of data, using Dask bags to work with unstructured and structured data. Finally, you’ll learn how to use Dask in Python to train machine learning models and improve your computing speeds.Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Lazy Evaluation and Parallel Computing
FreeThis chapter will teach you the basics of Dask and lazy evaluation. At the end of this chapter, you'll be able to speed up almost any Python code by using parallel processing or multi-threading. You'll learn the difference between these two task scheduling methods and which one is better under which circumstances.
Introduction to Dask50 xpLazy evaluation50 xpDelaying functions100 xpTask graphs and scheduling methods50 xpWhat are the different schedulers?100 xpPlotting the task graph100 xpBuilding delayed pipelines50 xpAnalyzing songs on Spotify100 xpHow danceable are songs these days?100 xpMost popular songs100 xp - 2
Parallel Processing of Big, Structured Data
Here you’ll learn how to analyze big structured data using Dask arrays and Dask DataFrames. You'll learn how everything you know about NumPy and pandas can easily be applied to data that is too large to fit in memory.
Dask arrays50 xpDask array chunksizes50 xpLoading and processing photos100 xpAn image processing pipeline100 xpDask DataFrames50 xpCreating Dask dataframes from CSVs100 xpRead Dask DataFrames from Parquet100 xpSummertime grooves100 xpMultidimensional arrays50 xpExploring HDF5 files50 xpDask arrays from HDF5 datasets100 xpDask arrays from Zarr datasets100 xpXarray50 xpExploratory data analysis with xarray100 xpMonthly mean temperatures100 xpCalculating the trend in European temperatures100 xp - 3
Dask Bags for Unstructured Data
Process any kind of data. You'll learn how Dask bags can be used to efficiently process unstructured text data, semi-structured JSON data, and even recorded audio.
Introduction to Dask bags50 xpCreating a Dask bag100 xpCreating a bag from saved text100 xpString operations100 xpDask bag operations50 xpLoading JSON data100 xpFiltering Dask bags100 xpChaining operations100 xpConverting unstructured data to a DataFrame50 xpRestructuring a dictionary100 xpConverting to DataFrame100 xpUsing any data in Dask bags50 xpLoading wav data100 xpConstructing custom Dask bags100 xpProcessing unstructured audio100 xp - 4
Dask Machine Learning and Final Pieces
Harness the power of Dask to train machine learning models. You'll learn how to train machine learning models on big data using the Dask-ML package, and how to split Dask calculations across a mixture of processes and threads for even greater computing speed.
Using processes and threads50 xpWhich scheduler will be used?50 xpClusters and clients100 xpTraining machine learning models on big datasets50 xpUsing Dask to train a linear model100 xpMaking lazy predictions100 xpPreprocessing big datasets50 xpLazily transforming training data100 xpLazy train-test split100 xpWrap-up50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.datasets
Spotify Songs - CSVSpotify Songs - ParquetEuropean Rainfall - HDF5European Rainfall - ZarrTripadvisor Hotel ReviewsPoliticianscollaborators
James Fulton
See MoreClimate Informatics Researcher
James is a PhD researcher at the University of Edinburgh, where he tutors computing, machine learning, data analysis, and statistical physics. His research involves using and developing machine learning algorithms to extract space-time patterns from climate records and climate models. He has held visiting researcher roles, working on planet-scale data analysis and modeling, at the University of Oxford and Queen's University Belfast and has a masters in physics where he specialized in quantum simulation. In a previous life, he was employed as a data scientist in the insurance sector. When not several indents deep in Python, he performs improvised comedy.
What do other learners have to say?
FAQs
Join over 15 million learners and start Parallel Programming with Dask in Python today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.