Feature Engineering with PySpark
Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.
Comience El Curso Gratis4 horas16 vídeos60 ejercicios14.558 aprendicesDeclaración de cumplimiento
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.¿Entrenar a 2 o más personas?
Pruebe DataCamp para empresasPreferido por estudiantes en miles de empresas
Descripción del curso
The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!
Empresas
¿Entrenar a 2 o más personas?
Obtenga acceso de su equipo a la biblioteca completa de DataCamp, con informes centralizados, tareas, proyectos y másEn las siguientes pistas
Big Data con PySpark
Ir a la pista- 1
Exploratory Data Analysis
GratuitoGet to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!
Where to Begin50 xpWhere to begin?50 xpCheck Version100 xpLoad in the data100 xpDefining A Problem50 xpWhat are we predicting?100 xpVerifying Data Load100 xpVerifying DataTypes100 xpVisually Inspecting Data / EDA50 xpUsing Corr()100 xpUsing Visualizations: distplot100 xpUsing Visualizations: lmplot100 xp - 2
Wrangling with Spark Functions
Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.
Dropping data50 xpDropping a list of columns100 xpUsing text filters to remove records100 xpFiltering numeric fields conditionally100 xpAdjusting Data50 xpCustom Percentage Scaling100 xpScaling your scalers100 xpCorrecting Right Skew Data100 xpWorking with Missing Data50 xpVisualizing Missing Data100 xpImputing Missing Data100 xpCalculate Missing Percents100 xpGetting More Data50 xpA Dangerous Join100 xpSpark SQL Join100 xpChecking for Bad Joins100 xp - 3
Feature Engineering
In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.
Feature Generation50 xpDifferences100 xpRatios100 xpDeeper Features100 xpTime Features50 xpTime Components100 xpJoining On Time Components100 xpDate Math100 xpExtracting Features50 xpExtracting Text to New Features100 xpSplitting & Exploding100 xpPivot & Join100 xpBinarizing, Bucketing & Encoding50 xpBinarizing Day of Week100 xpBucketing100 xpOne Hot Encoding100 xp - 4
Building a Model
In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!
Choosing the Algorithm50 xpWhich MLlib Module?50 xpCreating Time Splits100 xpAdjusting Time Features100 xpFeature Engineering Assumptions for RFR50 xpFeature Engineering For Random Forests50 xpDropping Columns with Low Observations100 xpNaively Handling Missing and Categorical Values100 xpBuilding a Model50 xpBuilding a Regression Model100 xpEvaluating & Comparing Algorithms100 xpUnderstanding Metrics50 xpInterpreting, Saving & Loading50 xpInterpreting Results100 xpSaving & Loading Models100 xpFinal Thoughts50 xp
Empresas
¿Entrenar a 2 o más personas?
Obtenga acceso de su equipo a la biblioteca completa de DataCamp, con informes centralizados, tareas, proyectos y másEn las siguientes pistas
Big Data con PySpark
Ir a la pistaconjuntos de datos
2017 St Paul MN Real Estate Datasetcolaboradores
John Hogue
Ver MásLead Data Scientist, General Mills
¿Qué tienen que decir otros alumnos?
¡Únete a 14 millones de estudiantes y empieza Feature Engineering with PySpark hoy mismo!
Crea Tu Cuenta Gratuita
o
Al continuar, acepta nuestros Términos de uso, nuestra Política de privacidad y que sus datos se almacenan en los EE. UU.