Skip to main content

Exploratory Data Analysis in Python

4.5+
31 reviews
Beginner

Learn how to explore, visualize, and extract insights from data.

Start Course for Free
4 Hours16 Videos52 Exercises68,209 Learners4150 XPData Analyst with Python TrackData Scientist with Python Track

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Loved by learners at thousands of companies


Course Description

How do we get from data to answers? Exploratory data analysis is a process for exploring datasets, answering questions, and visualizing results. This course presents the tools you need to clean and validate data, to visualize distributions and relationships between variables, and to use regression models to predict and explain. You'll explore data related to demographics and health, including the National Survey of Family Growth and the General Social Survey. But the methods you learn apply to all areas of science, engineering, and business. You'll use Pandas, a powerful library for working with data, and other core Python libraries including NumPy and SciPy, StatsModels for regression, and Matplotlib for visualization. With these tools and skills, you will be prepared to work with real data, make discoveries, and present compelling results.
  1. 1

    Read, clean, and validate

    Free

    The first step of almost any data project is to read the data, check for errors and special cases, and prepare data for analysis. This is exactly what you'll do in this chapter, while working with a dataset obtained from the National Survey of Family Growth.

    Play Chapter Now
    DataFrames and Series
    50 xp
    Read the codebook
    50 xp
    Exploring the NSFG data
    100 xp
    Clean and Validate
    50 xp
    Validate a variable
    50 xp
    Clean a variable
    100 xp
    Compute a variable
    100 xp
    Filter and visualize
    50 xp
    Make a histogram
    100 xp
    Compute birth weight
    100 xp
    Filter
    100 xp
  2. 2

    Distributions

    In the first chapter, having cleaned and validated your data, you began exploring it by using histograms to visualize distributions. In this chapter, you'll learn how to represent distributions using Probability Mass Functions (PMFs) and Cumulative Distribution Functions (CDFs). You'll learn when to use each of them, and why, while working with a new dataset obtained from the General Social Survey.

    Play Chapter Now
  3. 3

    Relationships

    Up until this point, you've only looked at one variable at a time. In this chapter, you'll explore relationships between variables two at a time, using scatter plots and other visualizations to extract insights from a new dataset obtained from the Behavioral Risk Factor Surveillance Survey (BRFSS). You'll also learn how to quantify those relationships using correlation and simple regression.

    Play Chapter Now

In the following tracks

Data Analyst with PythonData Scientist with Python

Collaborators

Chester Ismay
Yashas Roy
Allen Downey Headshot

Allen Downey

Professor, Olin College

I am a Professor of Computer Science at Olin College in Needham MA, and the author of Think Python, Think Bayes, Think Stats and several other books related to computer science and data science. Previously I taught at Wellesley College and Colby College, and in 2009 I was a Visiting Scientist at Google, Inc. I have a Ph.D. from U.C. Berkeley and B.S. and M.S. degrees from MIT. I write a blog about Bayesian statistics and related topics called Probably Overthinking It. Several of my books are published by O’Reilly Media and all are available under free licenses from Green Tea Press.
See More

Don’t just take our word for it

*4.5
from 31 reviews
58%
35%
6%
0%
0%
Sort by
  • John H.
    7 days

    Clear Instructions for practice!

  • Oluwaseun J.
    18 days

    The course was easy to follow and practice, also the exercises were not too much so it was quicker to go through the course.

  • Vaishnavi K.
    26 days

    It is an amazing course

  • Lukas B.
    26 days

    Great course on data exploration!

  • Gocha T.
    about 1 month

    I love DataCamp, I've got everything I wanted to learn and practice. Really cool.

  • Loading ...

"Clear Instructions for practice!"

John H.

"The course was easy to follow and practice, also the exercises were not too much so it was quicker to go through the course."

Oluwaseun J.

"It is an amazing course"

Vaishnavi K.

Join over 11 million learners and start Exploratory Data Analysis in Python today!

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.