CI/CD for Machine Learning
Elevate your Machine Learning Development with CI/CD using GitHub Actions and Data Version Control
Start Course for Free5 hours15 videos46 exercises3,571 learnersStatement of Accomplishment
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?
Try DataCamp for BusinessLoved by learners at thousands of companies
Course Description
The course will empower you to streamline your machine learning development processes, enhancing efficiency, reliability, and reproducibility in your projects. Throughout the course, you'll develop a comprehensive understanding of CI/CD workflows and YAML syntax, utilizing GitHub Actions (GA) for automation, training models in a pipeline, versioning datasets with DVC, performing hyperparameter tuning, and automating testing and pull requests.
Fundamentals of CI/CD, YAML, and Machine Learning
You'll be introduced to the fundamental concepts of CI/CD and YAML, and gain an understanding of the software development life cycle and key terms like build, test, and deploy. You'll define Continuous Integration, Continuous Delivery, and Continuous Deployment while examining their distinctions. You'll also explore the utility of CI/CD in machine learning and experimentation.GitHub Actions for CI/CD Automation
You'll learn about GA, a powerful platform for implementing CI/CD workflows. You'll discover the various elements of GA, including events, actions, jobs, steps, runners, and context. You'll learn how to define workflows triggered by events such as push and pull requests and customize runner machines. You'll also gain practical experience by setting up basic CI pipelines and understanding the GA log.Versioning Datasets with Data Version Control
You'll delve deep into Data Version Control (DVC) for versioning datasets, initializing DVC, and tracking datasets. Using DVC pipelines, you'll learn how to train classification models and generate metrics in a reproducible manner.Optimizing Model Performance and Hyperparameter Tuning
You'll now focus on model performance analysis and hyperparameter tuning and gain practical skills in diffing metrics and plots across branches to compare changes in model performance. You'll learn how to download artifacts using GA and perform hyperparameter tuning using scikit-learn's GridSearchCV. Additionally, you'll explore automating pull requests with the best model configuration.Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Machine Learning Engineer
Go To Track- 1
Introduction to Continuous Integration/Continuous Delivery and YAML
FreeIn this chapter, you will explore the essential principles of Continuous Integration/Continuous Delivery (CI/CD) and YAML. You'll grasp the software development life cycle and key terms like build, test, and deploy. Discover the differences between Continuous Integration, Continuous Delivery, and Continuous Deployment. Moreover, you'll investigate the significance of CI/CD in machine learning and experimentation.
Introduction to Continuous Integration/Continuous Delivery for Machine Learning50 xpContinuous deployment and delivery50 xpMachine learning workflow100 xpIntroduction to YAML50 xpYAML syntax100 xpYAML mappings and sequences100 xpIntroduction to GitHub Actions50 xpUtility of GitHub Actions100 xpAnatomy of GitHub Actions100 xp - 2
GitHub Actions
Get ready to explore GitHub Actions (GHA), an influential platform for executing CI/CD workflows. Uncover the diverse components of GHA, encompassing events, actions, jobs, steps, runners, and context. Gain insights into crafting workflows that activate upon events like push and pull requests, and tailor runner machines. Dive into hands-on learning as you establish fundamental CI pipelines and grasp the intricacies of the GHA log.
Intermediate YAML50 xpFind the correct combination50 xpDesign a Continuous Integration workflow100 xpSetting a basic CI pipeline50 xpInterpret GitHub Actions Workflow100 xpWrite a GitHub Actions Workflow100 xpRunning repository code50 xpFeature branches in shared repository model50 xpRunning Python code in GitHub Actions100 xpEnvironment Variables and Secrets50 xpWhat is GITHUB_TOKEN?50 xpWorking with environment variables100 xpWorking with secrets100 xp - 3
Continuous Integration in Machine Learning
In this chapter, you'll explore the integration of machine learning model training into a GitHub Action pipeline using Continuous Machine Learning GitHub Action. You'll generate a comprehensive markdown report including model metrics and plots. You will also delve into data versioning in Machine Learning by adopting Data Version Control (DVC) to track data changes. The chapter also covers setting DVC remotes and dataset transfers. Finally, you'll explore DVC pipelines, configuring a DVC YAML file to orchestrate reproducible model training.
Model training with GitHub Actions50 xpDevelop a classification model100 xpTrain a classification model100 xpSetup model training using CML100 xpVersioning datasets with Data Version Control50 xpWhy are .dvc files needed?50 xpData versioning in action100 xpInteracting with DVC remotes50 xpExploring the Benefits of DVC Remotes50 xpDVC remotes in action100 xpDVC Pipelines50 xpCreating a DVC pipeline100 xpTrain ML models with DVC100 xp - 4
Comparing training runs and Hyperparameter (HP) tuning
In this chapter, you will direct your attention towards the analysis of model performance and the fine-tuning of hyperparameters. You will acquire practical expertise in comparing metrics and visualizations across different branches to assess changes in model performance. You will conduct hyperparameter tuning using scikit-learn's GridSearchCV. Furthermore, you will delve into the automation of pull requests using the optimal model configuration.
Comparing metrics and plots in DVC50 xpAdding metrics and plots to dvc.yaml100 xpComparing metrics across Git branches100 xpRun DVC pipeline in GitHub Actions100 xpHyperparameter Tuning with DVC50 xpAdding Hyperparameter tuning to dvc.yaml100 xpRunning Hyperparameter tuning DVC pipelines100 xpGitHub Actions workflow for Hyperparameter Tuning50 xpLoose Coupling50 xpSetup Hyperparameter Tuning in GitHub Actions100 xpCongratulations!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.In the following Tracks
Machine Learning Engineer
Go To Trackdatasets
Weathercollaborators
Ravi Bhadauria
See MoreSenior Machine Learning Engineer
Ravi is a senior ML Engineer at Etsy where he is focused on solving problems at the intersection of Machine Learning and Distributed Systems. Previously, he has worked on healthcare and computational lithography domains. He holds a PhD specializing in Computational Chemical Physics and Mechanical Engineering.
What do other learners have to say?
Join over 15 million learners and start CI/CD for Machine Learning today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.