Handling Missing Data with Imputations in R
Diagnose, visualize and treat missing data with a range of imputation techniques with tips to improve your results.
Comienza El Curso Gratis4 horas13 vídeos49 ejercicios5160 aprendicesDeclaración de cumplimiento
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.¿Entrenar a 2 o más personas?
Probar DataCamp for BusinessPreferido por estudiantes en miles de empresas
Descripción del curso
Missing data is everywhere. The process of filling in missing values is known as imputation, and knowing how to correctly fill in missing data is an essential skill if you want to produce accurate predictions and distinguish yourself from the crowd. In this course, you’ll learn how to use visualizations and statistical tests to recognize missing data patterns and how to impute data using a collection of statistical and machine learning models. You’ll also gain decision-making skills, helping you decide which imputation method fits best in a particular situation. Finally, you’ll learn to incorporate uncertainty from imputation into your inference and predictions, making them more robust and reliable.
¿Entrenar a 2 o más personas?
Obtén a tu equipo acceso a la plataforma DataCamp completa, incluidas todas las funciones.- 1
The Problem of Missing Data
GratuitoIn this chapter, you’ll find out why missing data can be a risk when analyzing a dataset. You’ll be introduced to the three missing data mechanisms and learn how to recognize them using statistical tests and visualization tools.
Missing data: what can go wrong50 xpLinear regression with incomplete data100 xpAnalyzing regression output50 xpComparing models100 xpMissing data mechanisms50 xpRecognizing missing data mechanisms100 xpt-test for MAR: data preparation100 xpt-test for MAR: interpretation100 xpVisualizing missing data patterns50 xpAggregation plot100 xpSpine plot100 xpMosaic plot100 xp - 2
Donor-Based Imputation
Get to know the taxonomy of imputation methods and learn three donor-based techniques: mean, hot-deck, and k-Nearest-Neighbors imputation. You’ll look under the hood to see how these methods work, before learning how to apply them to a real-world tropical weather dataset. Along the way, you’ll also learn useful tricks that you can use to make them work even better for your problems.
Mean imputation50 xpSmelling the danger of mean imputation100 xpMean-imputing the temperature100 xpAssessing imputation quality with margin plot100 xpHot-deck imputation50 xpVanilla hot-deck100 xpHot-deck tricks & tips I: imputing within domains100 xpHot-deck tricks & tips II: sorting by correlated variables100 xpk-Nearest-Neighbors imputation50 xpChoosing the number of neighbors100 xpkNN tricks & tips I: weighting donors100 xpkNN tricks & tips II: sorting variables100 xp - 3
Model-Based Imputation
It’s time to learn how to use statistical and machine learning models, such as linear regression, logistic regression, and random forests, to impute missing data. In this chapter, you’ll look into how the models make their predictions and use this knowledge to draw the imputed values from conditional distributions. This is important as it ensures your imputations are more varied and plausible, making them more similar to the true data.
Model-based imputation approach50 xpLinear regression imputation100 xpInitializing missing values & iterating over variables100 xpDetecting convergence100 xpReplicating data variability50 xpLogistic regression imputation100 xpDrawing from conditional distribution100 xpModel-based imputation with multiple variable types100 xpTree-based imputation50 xpImputing with random forests100 xpVariable-wise imputation errors100 xpSpeed-accuracy trade-off100 xp - 4
Uncertainty from Imputation
Imputed values are not set in stone. They are just estimates and estimates come with some uncertainty. In this final chapter, you’ll discover how bootstrapping and chained equation using the mice package can be used to incorporate imputation uncertainty into your models and analyses to make them more reliable and robust.
Multiple imputation by bootstrapping50 xpWrapping imputation & modeling in a function100 xpRunning the bootstrap100 xpBootstrapping confidence intervals100 xpMultiple imputation by chained equations50 xpThe mice flow: mice - with - pool100 xpChoosing default models100 xpUsing predictor matrix100 xpPutting it all together50 xpAnalyzing missing data patterns100 xpImputing and inspecting outcomes100 xpInference with imputed data100 xpFinal remarks50 xp
¿Entrenar a 2 o más personas?
Obtén a tu equipo acceso a la plataforma DataCamp completa, incluidas todas las funciones.colaboradores
Michał Oleszak
Ver MásMachine Learning Engineer
¿Qué tienen que decir otros alumnos?
¡Únete a 15 millones de estudiantes y empieza Handling Missing Data with Imputations in R hoy mismo!
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.