Preprocessing for Machine Learning in Python
Learn how to clean and prepare your data for machine learning!
Kurs Kostenlos Starten4 Stunden20 Videos62 Übungen50.958 LernendeLeistungsnachweis
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.Trainierst du 2 oder mehr?
Versuchen DataCamp for BusinessBeliebt bei Lernenden in Tausenden Unternehmen
Kursbeschreibung
This course covers the basics of how and when to perform data preprocessing. This essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. You'll learn how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, you'll have some practice preprocessing by getting a dataset on UFO sightings ready for modeling.
Trainierst du 2 oder mehr?
Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.In den folgenden Tracks
Machine Learning Scientist mit Python
Gehe zu Track- 1
Introduction to Data Preprocessing
KostenlosIn this chapter you'll learn exactly what it means to preprocess data. You'll take the first steps in any preprocessing journey, including exploring data types and dealing with missing data.
- 2
Standardizing Data
This chapter is all about standardizing data. Often a model will make some assumptions about the distribution or scale of your features. Standardization is a way to make your data fit these assumptions and improve the algorithm's performance.
Standardization50 xpWhen to standardize50 xpModeling without normalizing100 xpLog normalization50 xpChecking the variance50 xpLog normalization in Python100 xpScaling data for feature comparison50 xpScaling data - investigating columns50 xpScaling data - standardizing columns100 xpStandardized data and modeling50 xpKNN on non-scaled data100 xpKNN on scaled data100 xp - 3
Feature Engineering
In this section you'll learn about feature engineering. You'll explore different ways to create new, more useful, features from the ones already in your dataset. You'll see how to encode, aggregate, and extract information from both numerical and textual features.
Feature engineering50 xpFeature engineering knowledge test50 xpIdentifying areas for feature engineering50 xpEncoding categorical variables50 xpEncoding categorical variables - binary100 xpEncoding categorical variables - one-hot100 xpEngineering numerical features50 xpAggregating numerical features100 xpExtracting datetime components100 xpEngineering text features50 xpExtracting string patterns100 xpVectorizing text100 xpText classification using tf/idf vectors100 xp - 4
Selecting Features for Modeling
This chapter goes over a few different techniques for selecting the most important features from your dataset. You'll learn how to drop redundant features, work with text vectors, and reduce the number of features in your dataset using principal component analysis (PCA).
Feature selection50 xpWhen to use feature selection50 xpIdentifying areas for feature selection50 xpRemoving redundant features50 xpSelecting relevant features100 xpChecking for correlated features100 xpSelecting features using text vectors50 xpExploring text vectors, part 1100 xpExploring text vectors, part 2100 xpTraining Naive Bayes with feature selection100 xpDimensionality reduction50 xpUsing PCA100 xpTraining a model with PCA100 xp - 5
Putting It All Together
Now that you've learned all about preprocessing you'll try these techniques out on a dataset that records information on UFO sightings.
UFOs and preprocessing50 xpChecking column types100 xpDropping missing data100 xpCategorical variables and standardization50 xpExtracting numbers from strings100 xpIdentifying features for standardization100 xpEngineering new features50 xpEncoding categorical variables100 xpFeatures from dates100 xpText vectorization100 xpFeature selection and modeling50 xpSelecting the ideal dataset100 xpModeling the UFO dataset, part 1100 xpModeling the UFO dataset, part 2100 xpCongratulations!50 xp
Trainierst du 2 oder mehr?
Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.In den folgenden Tracks
Machine Learning Scientist mit Python
Gehe zu TrackMitwirkende
James Chapman
Mehr AnzeigenCurriculum Manager, DataCamp
Was sagen andere Lernende?
Melden Sie sich an 15 Millionen Lernende und starten Sie Preprocessing for Machine Learning in Python Heute!
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.