ETL and ELT in Python
Learn to build effective, performant, and reliable data pipelines using Extract, Transform, and Load principles.
Start Course for Free4 hours14 videos53 exercises14,881 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Empowering Analytics with Data Pipelines
Data pipelines are at the foundation of every strong data platform. Building these pipelines is an essential skill for data engineers, who provide incredible value to a business ready to step into a data-driven future. This introductory course will help you hone the skills to build effective, performant, and reliable data pipelines.Building and Maintaining ETL Solutions
Throughout this course, you’ll dive into the complete process of building a data pipeline. You’ll grow skills leveraging Python libraries such aspandas
and json
to extract data from structured and unstructured sources before it’s transformed and persisted for downstream use. Along the way, you’ll develop confidence tools and techniques such as architecture diagrams, unit-tests, and monitoring that will help to set your data pipelines out from the rest. As you progress, you’ll put your new-found skills to the test with hands-on exercises.
Supercharge Data Workflows
After completing this course, you’ll be ready to design, develop and use data pipelines to supercharge your data workflow in your job, new career, or personal project.Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Machine Learning Engineer
Go To Track- 1
Introduction to Data Pipelines
FreeGet ready to discover how data is collected, processed, and moved using data pipelines. You will explore the qualities of the best data pipelines, and prepare to design and build your own.
- 2
Building ETL Pipelines
Dive into leveraging pandas to extract, transform, and load data as you build your first data pipelines. Learn how to make your ETL logic reusable, and apply logging and exception handling to your pipelines.
Extracting data from structure sources50 xpExtracting data from parquet files100 xpPulling data from SQL databases100 xpBuilding functions to extract data100 xpTransforming data with pandas50 xpFiltering pandas DataFrames100 xpTransforming sales data with pandas100 xpValidating data transformations100 xpPersisting data with pandas50 xpLoading sales data to a CSV file100 xpCustomizing a CSV file100 xpPersisting data to files100 xpMonitoring a data pipeline50 xpLogging within a data pipeline100 xpHandling exceptions when loading data100 xpMonitoring and alerting within a data pipeline100 xp - 3
Advanced ETL Techniques
Supercharge your workflow with advanced data pipelining techniques, such as working with non-tabular data and persisting DataFrames to SQL databases. Discover tooling to tackle advanced transformations with pandas, and uncover best-practices for working with complex data.
Extracting non-tabular data50 xpIngesting JSON data with pandas100 xpReading JSON data into memory100 xpTransforming non-tabular data50 xpIterating over dictionaries100 xpParsing data from dictionaries100 xpTransforming JSON data100 xpTransforming and cleaning DataFrames100 xpAdvanced data transformation with pandas50 xpFilling missing values with pandas100 xpGrouping data with pandas100 xpApplying advanced transformations to DataFrames100 xpLoading data to a SQL database with pandas50 xpLoading data to a Postgres database100 xpValidating data loaded to a Postgres Database100 xp - 4
Deploying and Maintaining a Data Pipeline
In this final chapter, you’ll create frameworks to validate and test data pipelines before shipping them into production. After you’ve tested your pipeline, you’ll explore techniques to run your data pipeline end-to-end, all while allowing for visibility into pipeline performance.
Manually testing a data pipeline50 xpTesting data pipelines50 xpValidating a data pipeline at "checkpoints"100 xpTesting a data pipeline end-to-end100 xpUnit-testing a data pipeline50 xpValidating a data pipeline with assert100 xpWriting unit tests with pytest100 xpCreating fixtures with pytest100 xpUnit testing a data pipeline with fixtures100 xpRunning a data pipeline in production50 xpOrchestration and ETL tools50 xpData pipeline architecture patterns100 xpRunning a data pipeline end-to-end100 xpCongratulations!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Machine Learning Engineer
Go To Trackcollaborators
Jake Roach
See MoreData Engineer
Jake Roach is a Field Data Engineer at Astronomer and DataCamp Instructor. A former Lead Data Engineer, Jake built a a state-of-the-art data platform for a multi-billion dollar organization, powered by Astronomer, Airflow, AWS, and Databricks. His passion for all things data engineering is contagious. Jake loves to write tutorials, teach DataCamp courses, and contribute to open source. Born and raised in Buffalo, NY, when he's not working with data, you can find him out at the golf course playing a quick nine holes before dark!
FAQs
Join over 15 million learners and start ETL and ELT in Python today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.