When your dataset is represented as a table or a database, it's difficult to observe much about it beyond its size and the types of variables it contains. In this course, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data. Which variables suggest interesting relationships? Which observations are unusual? By the end of the course, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful.
Exploring Categorical DataFree
In this chapter, you will learn how to create graphical and numerical summaries of two categorical variables.Exploring categorical data50 xpBar chart expectations50 xpContingency table review100 xpDropping levels100 xpSide-by-side bar charts100 xpBar chart interpretation50 xpCounts vs. proportions50 xpConditional proportions50 xpCounts vs. proportions (2)100 xpDistribution of one variable50 xpMarginal bar chart100 xpConditional bar chart100 xpImprove pie chart100 xp
Exploring Numerical Data
In this chapter, you will learn how to graphically summarize numerical data.Exploring numerical data50 xpFaceted histogram100 xpBoxplots and density plots100 xpCompare distribution via plots50 xpDistribution of one variable50 xpMarginal and conditional histograms100 xpMarginal and conditional histograms interpretation50 xpThree binwidths100 xpThree binwidths interpretation50 xpBox plots50 xpBox plots for outliers100 xpPlot selection100 xpVisualization in higher dimensions50 xp3 variable plot100 xpInterpret 3 var plot50 xp
Now that we've looked at exploring categorical and numerical data, you'll learn some useful statistics for describing distributions of data.Measures of center50 xpChoice of center measure50 xpCalculate center measures100 xpMeasures of variability50 xpChoice of spread measure50 xpCalculate spread measures100 xpChoose measures for center and spread100 xpShape and transformations50 xpDescribe the shape50 xpTransformations100 xpOutliers50 xpIdentify outliers100 xp
Apply what you've learned to explore and summarize a real world dataset in this case study of email spam.Introducing the data50 xpSpam and num_char100 xpSpam and num_char interpretation50 xpSpam and !!!100 xpSpam and !!! interpretation50 xpCheck-in 150 xpCollapsing levels100 xpImage and spam interpretation50 xpData Integrity100 xpAnswering questions with chains100 xpCheck-in 250 xpWhat's in a number?100 xpWhat's in a number interpretation50 xpConclusion50 xp
In the following tracksData Analyst with RData Scientist with RData Scientist Professional with RSupervised Machine Learning in Python
Andrew BraySee More
Assistant Professor of Statistics at Reed College
Andrew Bray is an assistant professor of statistics at Reed College. His interests are in computing, differential privacy, environmental statistics, and statistics education. He is a co-author of the infer package for tidy statistical inference.