Joining Data with pandas
Learn to combine data from multiple tables by joining data together using pandas.
Start Course for Free4 hours15 videos51 exercises164,238 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Being able to combine and work with multiple datasets is an essential skill for any aspiring Data Scientist. pandas is a crucial cornerstone of the Python data science ecosystem, with Stack Overflow recording 5 million views for pandas questions. Learn to handle multiple DataFrames by combining, organizing, joining, and reshaping them using pandas. You'll work with datasets from the World Bank and the City Of Chicago. You will finish the course with a solid skillset for data-joining in pandas.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Data Manipulation in Python
Go To Track- 1
Data Merging Basics
FreeLearn how you can merge disparate data using inner joins. By combining information from multiple sources you’ll uncover compelling insights that may have previously been hidden. You’ll also learn how the relationship between those sources, such as one-to-one or one-to-many, can affect your result.
Inner join50 xpWhat column to merge on?50 xpYour first inner join100 xpInner joins and number of rows returned100 xpOne-to-many relationships50 xpOne-to-many classification100 xpOne-to-many merge100 xpMerging multiple DataFrames50 xpTotal riders in a month100 xpThree table merge100 xpOne-to-many merge with multiple tables100 xp - 2
Merging Tables With Different Join Types
Take your knowledge of joins to the next level. In this chapter, you’ll work with TMDb movie data as you learn about left, right, and outer joins. You’ll also discover how to merge a table to itself and merge on a DataFrame index.
Left join50 xpCounting missing rows with left join100 xpEnriching a dataset100 xpHow many rows with a left join?50 xpOther joins50 xpRight join to find unique movies100 xpPopular genres with right join100 xpUsing outer join to select actors100 xpMerging a table to itself50 xpSelf join100 xpHow does pandas handle self joins?50 xpMerging on indexes50 xpIndex merge for movie ratings100 xpDo sequels earn more?100 xp - 3
Advanced Merging and Concatenating
In this chapter, you’ll leverage powerful filtering techniques, including semi-joins and anti-joins. You’ll also learn how to glue DataFrames by vertically combining and using the pandas.concat function to create new datasets. Finally, because data is rarely clean, you’ll also learn how to validate your newly combined data structures.
Filtering joins50 xpSteps of a semi join100 xpPerforming an anti join100 xpPerforming a semi join100 xpConcatenate DataFrames together vertically50 xpConcatenation basics100 xpConcatenating with keys100 xpVerifying integrity50 xpValidating a merge50 xpConcatenate and merge to find common songs100 xp - 4
Merging Ordered and Time-Series Data
In this final chapter, you’ll step up a gear and learn to apply pandas' specialized methods for merging time-series and ordered data together with real-world financial and economic data from the city of Chicago. You’ll also learn how to query resulting tables using a SQL-style format, and unpivot data using the melt method.
Using merge_ordered()50 xpCorrelation between GDP and S&P500100 xpPhillips curve using merge_ordered()100 xpmerge_ordered() caution, multiple columns100 xpUsing merge_asof()50 xpUsing merge_asof() to study stocks100 xpUsing merge_asof() to create dataset100 xpmerge_asof() and merge_ordered() differences100 xpSelecting data with .query()50 xpExplore financials with .query()50 xpSubsetting rows with .query()100 xpReshaping data with .melt()50 xpSelect the right .melt() arguments50 xpUsing .melt() to reshape government data100 xpUsing .melt() for stocks vs bond performance100 xpCourse wrap-up50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Data Manipulation in Python
Go To TrackIn other tracks
Python Data Fundamentalsdatasets
Chicago WardsChicago Business LicensesChicago CensusChicago Demographics by Zip CodeChicago Business OwnersChicago Land UseChicago Taxi VehiclesChicago Taxi OwnersCTA RidershipCTA CalendarCTA StationsMoviesMovie ActorsMovie RatingsMovie CastsMovie CrewsMovie GenresMovie SequelsMovie Financial DataMovie Tag LinesS&P 500World Bank GDPWorld Bank Populationcollaborators
prerequisites
Data Manipulation with pandasAaren Stubberfield
See MoreSenior Data Scientist @ Microsoft
I am a Senior Data Scientist with expertise in Machine Learning, AI, and data governance. Currently, I work for Microsoft's Digital Advertising, which has revenues of more than $10 billion in the fiscal year 2023. However, my experience is not limited to just the advertising industry. I have worked in the Supply Chain and Data Governance industries.
With my vast experience, I have led numerous teams of data scientists and have been instrumental in the successful completion of many projects. My technical skills include the use of AI, like LLMs, Python, and other various tools necessary for the execution of data science projects.
My passion lies in using data to gain insights and making data-driven decisions. I constantly strive to improve my skills and knowledge and am always open to learning new techniques and tools.
FAQs
Join over 15 million learners and start Joining Data with pandas today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.