Preprocessing for Machine Learning in Python
Learn how to clean and prepare your data for machine learning!
Commencer Le Cours Gratuitement4 heures20 vidéos62 exercices50 958 apprenantsDéclaration de réalisation
Créez votre compte gratuit
ou
En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.Formation de 2 personnes ou plus ?
Essayer DataCamp for BusinessApprécié par les apprenants de milliers d'entreprises
Description du cours
This course covers the basics of how and when to perform data preprocessing. This essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. You'll learn how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, you'll have some practice preprocessing by getting a dataset on UFO sightings ready for modeling.
Formation de 2 personnes ou plus ?
Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.Dans les titres suivants
Scientifique en apprentissage automatique en Python
Aller à la piste- 1
Introduction to Data Preprocessing
GratuitIn this chapter you'll learn exactly what it means to preprocess data. You'll take the first steps in any preprocessing journey, including exploring data types and dealing with missing data.
- 2
Standardizing Data
This chapter is all about standardizing data. Often a model will make some assumptions about the distribution or scale of your features. Standardization is a way to make your data fit these assumptions and improve the algorithm's performance.
Standardization50 xpWhen to standardize50 xpModeling without normalizing100 xpLog normalization50 xpChecking the variance50 xpLog normalization in Python100 xpScaling data for feature comparison50 xpScaling data - investigating columns50 xpScaling data - standardizing columns100 xpStandardized data and modeling50 xpKNN on non-scaled data100 xpKNN on scaled data100 xp - 3
Feature Engineering
In this section you'll learn about feature engineering. You'll explore different ways to create new, more useful, features from the ones already in your dataset. You'll see how to encode, aggregate, and extract information from both numerical and textual features.
Feature engineering50 xpFeature engineering knowledge test50 xpIdentifying areas for feature engineering50 xpEncoding categorical variables50 xpEncoding categorical variables - binary100 xpEncoding categorical variables - one-hot100 xpEngineering numerical features50 xpAggregating numerical features100 xpExtracting datetime components100 xpEngineering text features50 xpExtracting string patterns100 xpVectorizing text100 xpText classification using tf/idf vectors100 xp - 4
Selecting Features for Modeling
This chapter goes over a few different techniques for selecting the most important features from your dataset. You'll learn how to drop redundant features, work with text vectors, and reduce the number of features in your dataset using principal component analysis (PCA).
Feature selection50 xpWhen to use feature selection50 xpIdentifying areas for feature selection50 xpRemoving redundant features50 xpSelecting relevant features100 xpChecking for correlated features100 xpSelecting features using text vectors50 xpExploring text vectors, part 1100 xpExploring text vectors, part 2100 xpTraining Naive Bayes with feature selection100 xpDimensionality reduction50 xpUsing PCA100 xpTraining a model with PCA100 xp - 5
Putting It All Together
Now that you've learned all about preprocessing you'll try these techniques out on a dataset that records information on UFO sightings.
UFOs and preprocessing50 xpChecking column types100 xpDropping missing data100 xpCategorical variables and standardization50 xpExtracting numbers from strings100 xpIdentifying features for standardization100 xpEngineering new features50 xpEncoding categorical variables100 xpFeatures from dates100 xpText vectorization100 xpFeature selection and modeling50 xpSelecting the ideal dataset100 xpModeling the UFO dataset, part 1100 xpModeling the UFO dataset, part 2100 xpCongratulations!50 xp
Formation de 2 personnes ou plus ?
Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.Dans les titres suivants
Scientifique en apprentissage automatique en Python
Aller à la pistecollaborateurs
James Chapman
Voir PlusCurriculum Manager, DataCamp
Qu’est-ce que les autres apprenants ont à dire ?
Inscrivez-vous 15 millions d’apprenants et commencer Preprocessing for Machine Learning in Python Aujourd’hui!
Créez votre compte gratuit
ou
En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.