Supervised Learning in R: Regression
In this course you will learn how to predict future events using linear regression, generalized additive models, random forests, and xgboost.
Start Course for Free4 hours19 videos65 exercises41,785 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
From a machine learning perspective, regression is the task of predicting numerical outcomes from various inputs. In this course, you'll learn about different regression models, how to train these models in R, how to evaluate the models you train and use them to make predictions.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Machine Learning Fundamentals in R
Go To TrackMachine Learning Scientist in R
Go To Track- 1
What is Regression?
FreeIn this chapter we introduce the concept of regression from a machine learning point of view. We will present the fundamental regression method: linear regression. We will show how to fit a linear regression model and to make predictions from the model.
Welcome and Introduction50 xpIdentify the regression tasks50 xpLinear regression - the fundamental method50 xpCode a simple one-variable regression100 xpExamining a model100 xpPredicting once you fit a model50 xpPredicting from the unemployment model100 xpMultivariate linear regression (Part 1)100 xpMultivariate linear regression (Part 2)100 xpWrapping up linear regression50 xp - 2
Training and Evaluating Regression Models
Now that we have learned how to fit basic linear regression models, we will learn how to evaluate how well our models perform. We will review evaluating a model graphically, and look at two basic metrics for regression models. We will also learn how to train a model that will perform well in the wild, not just on training data. Although we will demonstrate these techniques using linear regression, all these concepts apply to models fit with any regression algorithm.
Evaluating a model graphically50 xpGraphically evaluate the unemployment model100 xpThe gain curve to evaluate the unemployment model100 xpRoot Mean Squared Error (RMSE)50 xpCalculate RMSE100 xpR-squared50 xpCalculate R-squared100 xpCorrelation and R-squared100 xpProperly Training a Model50 xpGenerating a random test/train split100 xpTrain a model using test/train split100 xpEvaluate a model using test/train split100 xpCreate a cross validation plan100 xpEvaluate a modeling procedure using n-fold cross-validation100 xp - 3
Issues to Consider
Before moving on to more sophisticated regression techniques, we will look at some other modeling issues: modeling with categorical inputs, interactions between variables, and when you might consider transforming inputs and outputs before modeling. While more sophisticated regression techniques manage some of these issues automatically, it's important to be aware of them, in order to understand which methods best handle various issues -- and which issues you must still manage yourself.
Categorical inputs50 xpExamining the structure of categorical inputs100 xpModeling with categorical inputs100 xpInteractions50 xpModeling an interaction100 xpModeling an interaction (2)100 xpTransforming the response before modeling50 xpRelative error100 xpModeling log-transformed monetary output100 xpComparing RMSE and root-mean-squared Relative Error100 xpTransforming inputs before modeling50 xpInput transforms: the "hockey stick"100 xpInput transforms: the "hockey stick" (2)100 xp - 4
Dealing with Non-Linear Responses
Now that we have mastered linear models, we will begin to look at techniques for modeling situations that don't meet the assumptions of linearity. This includes predicting probabilities and frequencies (values bounded between 0 and 1); predicting counts (nonnegative integer values, and associated rates); and responses that have a non-linear but additive relationship to the inputs. These algorithms are variations on the standard linear model.
Logistic regression to predict probabilities50 xpFit a model of sparrow survival probability100 xpPredict sparrow survival100 xpPoisson and quasipoisson regression to predict counts50 xpPoisson or quasipoisson50 xpFit a model to predict bike rental counts100 xpPredict bike rentals on new data100 xpVisualize the bike rental predictions100 xpGAM to learn non-linear transforms50 xpWriting formulas for GAM models50 xpWriting formulas for GAM models (2)50 xpModel soybean growth with GAM100 xpPredict with the soybean model on test data100 xp - 5
Tree-Based Methods
In this chapter we will look at modeling algorithms that do not assume linearity or additivity, and that can learn limited types of interactions among input variables. These algorithms are *tree-based* methods that work by combining ensembles of *decision trees* that are learned from the training data.
The intuition behind tree-based methods50 xpPredicting with a decision tree50 xpRandom forests50 xpBuild a random forest model for bike rentals100 xpPredict bike rentals with the random forest model100 xpVisualize random forest bike model predictions100 xpOne-Hot-Encoding Categorical Variables50 xpvtreat on a small example100 xpNovel levels100 xpvtreat the bike rental data100 xpGradient boosting machines50 xpFind the right number of trees for a gradient boosting machine100 xpFit an xgboost bike rental model and predict100 xpEvaluate the xgboost bike rental model100 xpVisualize the xgboost bike rental model100 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Machine Learning Fundamentals in R
Go To TrackMachine Learning Scientist in R
Go To Trackcollaborators
prerequisites
Introduction to Regression in RNina Zumel
See MoreCo-founder, Principal Consultant at Win-Vector, LLC
Nina is a co-founder and principal consultant at Win-Vector LLC, a San Francisco data science consultancy. She is co-author of the popular text Practical Data Science with R and occasionally blogs at the Win-Vector Blog on data science and R. Her technical interests include data science, statistics, statistical learning, and data visualization.
John Mount
See MoreCo-founder, Principal Consultant at Win-Vector, LLC
John is a co-founder and principal consultant at Win-Vector LLC, a San Francisco data science consultancy. He is the author of several R packages, including the data treatment package vtreat. John is co-author of Practical Data Science with R and blogs at the Win-Vector Blog about data science and R programming. His interests include data science, statistics, R programming, and theoretical computer science.
Join over 15 million learners and start Supervised Learning in R: Regression today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.