Introduction to Data Engineering
Learn about the world of data engineering in this short course, covering tools and topics like ETL and cloud computing.
Start Course for Free4 hours15 videos57 exercises114,667 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Get Started in Data Engineering
Are you curious about a career in data engineering but don’t know where to start? Or perhaps you want more information on what data engineers do before you take the next steps? This four-hour course is an introduction to data engineering and the core concepts, techniques, and tools you need to understand to do the job.Learn Data Engineering Concepts and Techniques
You’ll start by learning the differences between a data engineer and a data scientist (and how they work together) before finding out more about the tools of the trade, specifically talking about cloud computing and parallel computing. By the end of the second chapter, you’ll understand the applications of SQL and NoSQL, using DataFrames, and why parallel computing is so important.Perform ETL in Hands-on Exercises
The ETL process is core to a data engineer’s workflow. You will learn how data is extracted, transformed, and loaded to get it ready for analysis and generating insights. At the end of the course, you’ll put all this knowledge into practice by performing and scheduling an ETL process yourself using real-world data.Our exercises and interactive tests allow you to review and cement your new knowledge, so you’re confident discussing and applying it once you’ve received your Statement of Accomplishment.
This introductory course is part of a data engineering Track, which offers you pathways to improve your understanding of data engineering and a clear set of next steps to becoming a professional data engineer.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Introduction to Data Engineering
FreeIn this first chapter, you will be exposed to the world of data engineering! Explore the differences between a data engineer and a data scientist, get an overview of the various tools data engineers use and expand your understanding of how cloud technology plays a role in data engineering.
What is data engineering?50 xpTasks of the data engineer50 xpData engineer or data scientist?100 xpData engineering problems50 xpTools of the data engineer50 xpKinds of databases50 xpProcessing tasks50 xpScheduling tools50 xpCloud providers50 xpWhy cloud computing?50 xpBig players in cloud computing100 xpCloud services100 xp - 2
Data engineering toolbox
Now that you know the primary differences between a data engineer and a data scientist, get ready to explore the data engineer's toolbox! Learn in detail about different types of databases data engineers use, how parallel computing is a cornerstone of the data engineer's toolkit, and how to schedule data processing jobs using scheduling frameworks.
Databases50 xpSQL vs NoSQL100 xpThe database schema100 xpJoining on relations100 xpStar schema diagram50 xpWhat is parallel computing50 xpWhy parallel computing?50 xpFrom task to subtasks100 xpUsing a DataFrame100 xpParallel computation frameworks50 xpSpark, Hadoop and Hive100 xpA PySpark groupby100 xpRunning PySpark files50 xpWorkflow scheduling frameworks50 xpAirflow, Luigi and cron50 xpAirflow DAGs100 xp - 3
Extract, Transform and Load (ETL)
Having been exposed to the toolbox of data engineers, it's now time to jump into the bread and butter of a data engineer's workflow! With ETL, you will learn how to extract raw data from various sources, transform this raw data into actionable insights, and load it into relevant databases ready for consumption!
Extract50 xpData sources50 xpFetch from an API100 xpRead from a database100 xpTransform50 xpSplitting the rental rate100 xpPrepare for transformations50 xpJoining with ratings100 xpLoading50 xpOLAP or OLTP50 xpWriting to a file100 xpLoad into Postgres100 xpPutting it all together50 xpDefining a DAG100 xpSetting up Airflow50 xpInterpreting the DAG50 xp - 4
Case Study: DataCamp
Cap off all that you've learned in the previous three chapters by completing a real-world data engineering use case from DataCamp! You will perform and schedule an ETL process that transforms raw course rating data, into actionable course recommendations for DataCamp students!
Course ratings50 xpExploring the schema50 xpQuerying the table100 xpAverage rating per course100 xpFrom ratings to recommendations50 xpFilter out corrupt data100 xpUsing the recommender transformation100 xpScheduling daily jobs50 xpThe target table100 xpDefining the DAG100 xpEnable the DAG50 xpQuerying the recommendations100 xpCongratulations50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.Vincent Vankrunkelsven
See MoreData and Software Engineer @DataCamp
FAQs
Join over 15 million learners and start Introduction to Data Engineering today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.