Feature Engineering with PySpark
Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.
Comece O Curso Gratuitamente4 horas16 vídeos60 exercícios14.743 aprendizesDeclaração de Realização
Crie sua conta gratuita
ou
Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.Treinar 2 ou mais pessoas?
Tentar DataCamp for BusinessAmado por alunos de milhares de empresas
Descrição do Curso
The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!
Treinar 2 ou mais pessoas?
Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.Nas seguintes faixas
Big Data com PySpark
Ir para a trilha- 1
Exploratory Data Analysis
GratuitoGet to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!
Where to Begin50 xpWhere to begin?50 xpCheck Version100 xpLoad in the data100 xpDefining A Problem50 xpWhat are we predicting?100 xpVerifying Data Load100 xpVerifying DataTypes100 xpVisually Inspecting Data / EDA50 xpUsing Corr()100 xpUsing Visualizations: distplot100 xpUsing Visualizations: lmplot100 xp - 2
Wrangling with Spark Functions
Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.
Dropping data50 xpDropping a list of columns100 xpUsing text filters to remove records100 xpFiltering numeric fields conditionally100 xpAdjusting Data50 xpCustom Percentage Scaling100 xpScaling your scalers100 xpCorrecting Right Skew Data100 xpWorking with Missing Data50 xpVisualizing Missing Data100 xpImputing Missing Data100 xpCalculate Missing Percents100 xpGetting More Data50 xpA Dangerous Join100 xpSpark SQL Join100 xpChecking for Bad Joins100 xp - 3
Feature Engineering
In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.
Feature Generation50 xpDifferences100 xpRatios100 xpDeeper Features100 xpTime Features50 xpTime Components100 xpJoining On Time Components100 xpDate Math100 xpExtracting Features50 xpExtracting Text to New Features100 xpSplitting & Exploding100 xpPivot & Join100 xpBinarizing, Bucketing & Encoding50 xpBinarizing Day of Week100 xpBucketing100 xpOne Hot Encoding100 xp - 4
Building a Model
In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!
Choosing the Algorithm50 xpWhich MLlib Module?50 xpCreating Time Splits100 xpAdjusting Time Features100 xpFeature Engineering Assumptions for RFR50 xpFeature Engineering For Random Forests50 xpDropping Columns with Low Observations100 xpNaively Handling Missing and Categorical Values100 xpBuilding a Model50 xpBuilding a Regression Model100 xpEvaluating & Comparing Algorithms100 xpUnderstanding Metrics50 xpInterpreting, Saving & Loading50 xpInterpreting Results100 xpSaving & Loading Models100 xpFinal Thoughts50 xp
Treinar 2 ou mais pessoas?
Obtenha acesso à sua equipe à plataforma DataCamp completa, incluindo todos os recursos.Nas seguintes faixas
Big Data com PySpark
Ir para a trilhaconjuntos de dados
2017 St Paul MN Real Estate Datasetcolaboradores
John Hogue
Ver MaisLead Data Scientist, General Mills
O que os outros alunos têm a dizer?
Junte-se a mais de 15 milhões de alunos e comece Feature Engineering with PySpark hoje mesmo!
Crie sua conta gratuita
ou
Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.