Building Recommendation Engines with PySpark
Learn tools and techniques to leverage your own big data to facilitate positive experiences for your users.
Start Course for Free4 hours15 videos56 exercises12,213 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
This course will show you how to build recommendation engines using Alternating Least Squares in PySpark. Using the popular MovieLens dataset and the Million Songs dataset, this course will take you step by step through the intuition of the Alternating Least Squares algorithm as well as the code to train, test and implement ALS models on various types of customer data.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Big Data with PySpark
Go To Track- 1
Recommendations Are Everywhere
FreeThis chapter will show you how powerful recommendations engines can be, and provide important distinctions between collaborative-filtering engines and content-based engines as well as the different types of implicit and explicit data that recommendation engines can use. You will also learn a very powerful way to uncover hidden features (latent features) that you may not even know exist in customer datasets.
Why learn how to build recommendation engines?50 xpSee the power of a recommendation engine100 xpPower of recommendation engines50 xpRecommendation engine types and data types50 xpCollaborative vs content-based filtering50 xpCollaborative vs content based filtering part II50 xpImplicit vs explicit data100 xpRatings data types100 xpUses for recommendation engines50 xpAlternate uses of recommendation engines.50 xpConfirm understanding of latent features100 xp - 2
How does ALS work?
In this chapter you will review basic concepts of matrix multiplication and matrix factorization, and dive into how the Alternating Least Squares algorithm works and what arguments and hyperparameters it uses to return the best recommendations possible. You will also learn important techniques for properly preparing your data for ALS in Spark.
Overview of matrix multiplication50 xpMatrix multiplication100 xpMatrix multiplication part II100 xpOverview of matrix factorization50 xpMatrix factorization100 xpNon-negative matrix factorization100 xpHow ALS alternates to generate predictions50 xpEstimating recommendations100 xpRMSE as ALS alternates100 xpData preparation for Spark ALS50 xpCorrect format and distinct users100 xpAssigning integer id's to movies100 xpALS parameters and hyperparameters50 xpBuild out an ALS model100 xpBuild RMSE evaluator100 xpGet RMSE100 xp - 3
Recommending Movies
In this chapter you will be introduced to the MovieLens dataset. You will walk through how to assess it's use for ALS, build out a full cross-validated ALS model on it, and learn how to evaluate it's performance. This will be the foundation for all subsequent ALS models you build using Pyspark.
Introduction to the MovieLens dataset50 xpViewing the MovieLens Data100 xpCalculate sparsity100 xpThe GroupBy and Filter methods100 xpMovieLens Summary Statistics100 xpView Schema100 xpALS model buildout on MovieLens Data50 xpCreate test/train splits and build your ALS model100 xpTell Spark how to tune your ALS model100 xpBuild your cross validation pipeline100 xpBest Model and Best Model Parameters100 xpModel Performance Evaluation50 xpGenerate predictions and calculate RMSE100 xpInterpreting the RMSE50 xpDo recommendations make sense100 xp - 4
What if you don't have customer ratings?
In most real-life situations, you won't not have "perfect" customer data available to build an ALS model. This chapter will teach you how to use your customer behavior data to "infer" customer ratings and use those inferred ratings to build an ALS recommendation engine. Using the Million Songs Dataset as well as another version of the MovieLens dataset, this chapter will show you how to use the data available to you to build a recommendation engine using ALS and evaluate it's performance.
Introduction to the Million Songs Dataset50 xpConfirm understanding of implicit rating concepts50 xpMSD summary statistics100 xpGrouped summary statistics100 xpAdd zeros100 xpEvaluating implicit ratings models50 xpSpecify ALS hyperparameters100 xpBuild implicit models100 xpRunning a cross-validated implicit ALS model100 xpExtracting parameters100 xpOverview of binary, implicit ratings50 xpBinary model performance100 xpRecommendations from binary data100 xpCourse recap50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Big Data with PySpark
Go To Trackcollaborators
Jamen Long
See MoreData Scientist
What do other learners have to say?
Join over 15 million learners and start Building Recommendation Engines with PySpark today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.