Case Studies in Statistical Thinking
Take vital steps towards mastery as you apply your statistical thinking skills to real-world data sets and extract actionable insights from them.
Start Course for Free4 hours16 videos61 exercises15,248 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
Mastery requires practice. Having completed Statistical Thinking I and II, you developed your probabilistic mindset and the hacker stats skills to extract actionable insights from your data. Your foundation is in place, and now it is time practice your craft.
In this course, you will apply your statistical thinking skills, exploratory data analysis, parameter estimation, and hypothesis testing, to two new real-world data sets. First, you will explore data from the 2013 and 2015 FINA World Aquatics Championships, where you will quantify the relative speeds and variability among swimmers. You will then perform a statistical analysis to assess the "current controversy" of the 2013 Worlds in which swimmers claimed that a slight current in the pool was affecting result. Second, you will study the frequency and magnitudes of earthquakes around the world. Finally, you will analyze the changes in seismicity in the US state of Oklahoma after the practice of high pressure waste water injection at oil extraction sites became commonplace in the last decade. As you work with these data sets, you will take vital steps toward mastery as you cement your existing knowledge and broaden your abilities to use statistics and Python to make sense of your data.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Fish sleep and bacteria growth: A review of Statistical Thinking I and II
FreeTo begin, you'll use two data sets from Caltech researchers to rehash the key points of Statistical Thinking I and II to prepare you for the following case studies!
Activity of zebrafish and melatonin50 xpEDA: Plot ECDFs of active bout length100 xpInterpreting ECDFs and the story50 xpBootstrap confidence intervals50 xpParameter estimation: active bout length100 xpPermutation and bootstrap hypothesis tests50 xpPermutation test: wild type versus heterozygote100 xpBootstrap hypothesis test100 xpLinear regressions and pairs bootstrap50 xpAssessing the growth rate100 xpPlotting the growth curve100 xp - 2
Analysis of results of the 2015 FINA World Swimming Championships
In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships.
Introduction to swimming data50 xpGraphical EDA of men's 200 free heats100 xp200 m free time with confidence interval100 xpDo swimmers go faster in the finals?50 xpEDA: finals versus semifinals100 xpParameter estimates of difference between finals and semifinals100 xpHow to do the permutation test50 xpGenerating permutation samples100 xpHypothesis test: Do women swim the same way in semis and finals?100 xpHow does the performance of swimmers decline over long events?50 xpEDA: Plot all your data100 xpLinear regression of average split time100 xpHypothesis test: are they slowing down?100 xp - 3
The "Current Controversy" of the 2013 World Championships
Some swimmers said that they felt it was easier to swim in one direction versus another in the 2013 World Championships. Some analysts have posited that there was a swirling current in the pool. In this chapter, you'll investigate this claim! References - Quartz Media, Washington Post, SwimSwam (and also here), and Cornett, et al.
Introduction to the current controversy50 xpA metric for improvement50 xpECDF of improvement from low to high lanes100 xpEstimation of mean improvement100 xpHow should we test the hypothesis?50 xpHypothesis test: Does lane assignment affect performance?100 xpDid the 2015 event have this problem?100 xpThe zigzag effect50 xpWhich splits should we consider?50 xpEDA: mean differences between odd and even splits100 xpHow does the current effect depend on lane position?100 xpHypothesis test: can this be by chance?100 xpRecap of swimming analysis50 xp - 4
Statistical seismology and the Parkfield region
Herein, you'll use your statistical thinking skills to study the frequency and magnitudes of earthquakes. Along the way, you'll learn some basic statistical seismology, including the Gutenberg-Richter law. This exercise exposes two key ideas about data science: 1) As a data scientist, you wander into all sorts of domain specific analyses, which is very exciting. You constantly get to learn. 2) You are sometimes faced with limited data, which is also the case for many of these earthquake studies. You can still make good progress!
Introduction to statistical seismology and the Parkfield experiment50 xpParkfield earthquake magnitudes100 xpComputing the b-value100 xpThe b-value for Parkfield100 xpTiming of major earthquakes and the Parkfield sequence50 xpInterearthquake time estimates for Parkfield100 xpWhen will the next big Parkfield quake be?100 xpHow are the Parkfield interearthquake times distributed?50 xpComputing the value of a formal ECDF50 xpComputing the K-S statistic100 xpDrawing K-S replicates100 xpThe K-S test for Exponentiality100 xp - 5
Earthquakes and oil mining in Oklahoma
Of course, earthquakes have a big impact on society, and recently are connected to human activity. In this final chapter, you'll investigate the effect that increased injection of saline wastewater due to oil mining in Oklahoma has had on the seismicity of the region.
Variations in earthquake frequency and seismicity50 xpEDA: Plotting earthquakes over time100 xpEstimates of the mean interearthquake times100 xpHypothesis test: did earthquake frequency change?100 xpHow to display your analysis50 xpEarthquake magnitudes in Oklahoma50 xpEDA: Comparing magnitudes before and after 2010100 xpQuantification of the b-values100 xpHow should we do a hypothesis test on differences of the b-value?50 xpHypothesis test: are the b-values different?100 xpWhat can you conclude from this analysis?50 xpClosing comments50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.datasets
Swimming results, 2013 World Aquatics ChampionshipsSwimming results, 2015 World Aquatics ChampionshipsZebrafish active bout lengthsOklahoma earthquakes (1950 to mid-2017)Bacterial growthParkfield earthquakes (1950 to mid-2017)collaborators
Justin Bois
See MoreLecturer at the California Institute of Technology
Justin Bois is a Teaching Professor in the Division of Biology and Biological Engineering at the California Institute of Technology. He teaches nine different classes there, nearly all of which heavily feature Python. He is dedicated to empowering students in the biological sciences with quantitative tools, particularly data analysis skills. Beyond biologists, he is thrilled to develop courses for DataCamp, whose students are an excited bunch of burgeoning data scientists!
What do other learners have to say?
FAQs
Join over 15 million learners and start Case Studies in Statistical Thinking today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.