Handling Missing Data with Imputations in R

Diagnose, visualize and treat missing data with a range of imputation techniques with tips to improve your results.

Comece O Curso Gratuitamente

4 horas13 vídeos49 exercícios5.160 aprendizesDeclaração de Realização

Crie sua conta gratuita

Google LinkedIn Facebook

Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.

Treinar 2 ou mais pessoas?

Tentar DataCamp for Business

Amado por alunos de milhares de empresas

Descrição do Curso

Missing data is everywhere. The process of filling in missing values is known as imputation, and knowing how to correctly fill in missing data is an essential skill if you want to produce accurate predictions and distinguish yourself from the crowd. In this course, you’ll learn how to use visualizations and statistical tests to recognize missing data patterns and how to impute data using a collection of statistical and machine learning models. You’ll also gain decision-making skills, helping you decide which imputation method fits best in a particular situation. Finally, you’ll learn to incorporate uncertainty from imputation into your inference and predictions, making them more robust and reliable.

Para Empresas

Treinar 2 ou mais pessoas?

Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.

1
The Problem of Missing Data
Gratuito
In this chapter, you’ll find out why missing data can be a risk when analyzing a dataset. You’ll be introduced to the three missing data mechanisms and learn how to recognize them using statistical tests and visualization tools.
Reproduzir Capítulo Agora
Missing data: what can go wrong
50 xp
Linear regression with incomplete data
100 xp
Analyzing regression output
50 xp
Comparing models
100 xp
Missing data mechanisms
50 xp
Recognizing missing data mechanisms
100 xp
t-test for MAR: data preparation
100 xp
t-test for MAR: interpretation
100 xp
Visualizing missing data patterns
50 xp
Aggregation plot
100 xp
Spine plot
100 xp
Mosaic plot
100 xp
2
Donor-Based Imputation
Get to know the taxonomy of imputation methods and learn three donor-based techniques: mean, hot-deck, and k-Nearest-Neighbors imputation. You’ll look under the hood to see how these methods work, before learning how to apply them to a real-world tropical weather dataset. Along the way, you’ll also learn useful tricks that you can use to make them work even better for your problems.
Reproduzir Capítulo Agora
Mean imputation
50 xp
Smelling the danger of mean imputation
100 xp
Mean-imputing the temperature
100 xp
Assessing imputation quality with margin plot
100 xp
Hot-deck imputation
50 xp
Vanilla hot-deck
100 xp
Hot-deck tricks & tips I: imputing within domains
100 xp
Hot-deck tricks & tips II: sorting by correlated variables
100 xp
k-Nearest-Neighbors imputation
50 xp
Choosing the number of neighbors
100 xp
kNN tricks & tips I: weighting donors
100 xp
kNN tricks & tips II: sorting variables
100 xp
3
Model-Based Imputation
It’s time to learn how to use statistical and machine learning models, such as linear regression, logistic regression, and random forests, to impute missing data. In this chapter, you’ll look into how the models make their predictions and use this knowledge to draw the imputed values from conditional distributions. This is important as it ensures your imputations are more varied and plausible, making them more similar to the true data.
Reproduzir Capítulo Agora
Model-based imputation approach
50 xp
Linear regression imputation
100 xp
Initializing missing values & iterating over variables
100 xp
Detecting convergence
100 xp
Replicating data variability
50 xp
Logistic regression imputation
100 xp
Drawing from conditional distribution
100 xp
Model-based imputation with multiple variable types
100 xp
Tree-based imputation
50 xp
Imputing with random forests
100 xp
Variable-wise imputation errors
100 xp
Speed-accuracy trade-off
100 xp
4
Uncertainty from Imputation
Imputed values are not set in stone. They are just estimates and estimates come with some uncertainty. In this final chapter, you’ll discover how bootstrapping and chained equation using the mice package can be used to incorporate imputation uncertainty into your models and analyses to make them more reliable and robust.
Reproduzir Capítulo Agora
Multiple imputation by bootstrapping
50 xp
Wrapping imputation & modeling in a function
100 xp
Running the bootstrap
100 xp
Bootstrapping confidence intervals
100 xp
Multiple imputation by chained equations
50 xp
The mice flow: mice - with - pool
100 xp
Choosing default models
100 xp
Using predictor matrix
100 xp
Putting it all together
50 xp
Analyzing missing data patterns
100 xp
Imputing and inspecting outcomes
100 xp
Inference with imputed data
100 xp
Final remarks
50 xp

Para Empresas

Treinar 2 ou mais pessoas?

Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.

conjuntos de dados

Biopics dataset Tropical Atmosphere Ocean dataset

colaboradores

Amy Peterson

Adel Nehme

pré-requisitos

Intermediate Regression in R Dealing With Missing Data in R

Michał Oleszak

Machine Learning Engineer

Ver Mais

O que os outros alunos têm a dizer?

Junte-se a mais de 15 milhões de alunos e comece Handling Missing Data with Imputations in R hoje mesmo!

Crie sua conta gratuita

Google LinkedIn Facebook

Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.

Descrição do Curso

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Treinar 2 ou mais pessoas?

The Problem of Missing Data

Donor-Based Imputation

Model-Based Imputation

Uncertainty from Imputation

Treinar 2 ou mais pessoas?

O que os outros alunos têm a dizer?

Junte-se a mais de .css-ou6dz6{color:#03ef62;}15 milhões de alunos e comece Handling Missing Data with Imputations in R hoje mesmo!

Crie sua conta gratuita

Treinar 2 ou mais pessoas?

Junte-se a mais de 15 milhões de alunos e comece Handling Missing Data with Imputations in R hoje mesmo!