Feature Engineering with PySpark
Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.
Commencer Le Cours Gratuitement4 heures16 vidéos60 exercices14 806 apprenantsDéclaration de réalisation
Créez votre compte gratuit
ou
En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.Formation de 2 personnes ou plus ?
Essayer DataCamp for BusinessApprécié par les apprenants de milliers d'entreprises
Description du cours
The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!
Formation de 2 personnes ou plus ?
Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.Dans les titres suivants
Big Data avec PySpark
Aller à la piste- 1
Exploratory Data Analysis
GratuitGet to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!
Where to Begin50 xpWhere to begin?50 xpCheck Version100 xpLoad in the data100 xpDefining A Problem50 xpWhat are we predicting?100 xpVerifying Data Load100 xpVerifying DataTypes100 xpVisually Inspecting Data / EDA50 xpUsing Corr()100 xpUsing Visualizations: distplot100 xpUsing Visualizations: lmplot100 xp - 2
Wrangling with Spark Functions
Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.
Dropping data50 xpDropping a list of columns100 xpUsing text filters to remove records100 xpFiltering numeric fields conditionally100 xpAdjusting Data50 xpCustom Percentage Scaling100 xpScaling your scalers100 xpCorrecting Right Skew Data100 xpWorking with Missing Data50 xpVisualizing Missing Data100 xpImputing Missing Data100 xpCalculate Missing Percents100 xpGetting More Data50 xpA Dangerous Join100 xpSpark SQL Join100 xpChecking for Bad Joins100 xp - 3
Feature Engineering
In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.
Feature Generation50 xpDifferences100 xpRatios100 xpDeeper Features100 xpTime Features50 xpTime Components100 xpJoining On Time Components100 xpDate Math100 xpExtracting Features50 xpExtracting Text to New Features100 xpSplitting & Exploding100 xpPivot & Join100 xpBinarizing, Bucketing & Encoding50 xpBinarizing Day of Week100 xpBucketing100 xpOne Hot Encoding100 xp - 4
Building a Model
In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!
Choosing the Algorithm50 xpWhich MLlib Module?50 xpCreating Time Splits100 xpAdjusting Time Features100 xpFeature Engineering Assumptions for RFR50 xpFeature Engineering For Random Forests50 xpDropping Columns with Low Observations100 xpNaively Handling Missing and Categorical Values100 xpBuilding a Model50 xpBuilding a Regression Model100 xpEvaluating & Comparing Algorithms100 xpUnderstanding Metrics50 xpInterpreting, Saving & Loading50 xpInterpreting Results100 xpSaving & Loading Models100 xpFinal Thoughts50 xp
Formation de 2 personnes ou plus ?
Donnez à votre équipe l’accès à la plateforme DataCamp complète, y compris toutes les fonctionnalités.Dans les titres suivants
Big Data avec PySpark
Aller à la pisteensembles de données
2017 St Paul MN Real Estate Datasetcollaborateurs
John Hogue
Voir PlusLead Data Scientist, General Mills
Qu’est-ce que les autres apprenants ont à dire ?
Inscrivez-vous 15 millions d’apprenants et commencer Feature Engineering with PySpark Aujourd’hui!
Créez votre compte gratuit
ou
En continuant, vous acceptez nos Conditions d'utilisation, notre Politique de confidentialité et le fait que vos données sont stockées aux États-Unis.