Direkt zum Inhalt
StartseiteMachine Learning

Introduction to Data Versioning with DVC

Explore Data Version Control for ML data management. Master setup, automate pipelines, and evaluate models seamlessly.

Kurs Kostenlos Starten
3 Stunden12 Videos35 Übungen

Kostenloses Konto erstellen

GoogleLinkedInFacebook

oder

Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.
Group

Trainierst du 2 oder mehr?

Versuchen DataCamp for Business

Beliebt bei Lernenden in Tausenden Unternehmen


Kursbeschreibung

This course offers a comprehensive introduction to Data Version Control (DVC), a tool designed for efficient management and versioning of machine learning data. You will get an understanding of the machine learning product lifecycle, differentiating data versioning from code versioning and exploring DVC’s features and use cases.

Exploring DVC features

You will understand the motivations behind data versioning, the machine learning lifecycle, and DVC’s distinct features and use cases. You will also learn about DVC setup, covering installation, repository initialization, and the .dvcignore file. You will explore DVC cache and staging files, learn to add and remove files, manage caches, and understand the underlying mechanisms. You will learn about DVC remotes, explain the distinction between DVC and Git remotes, add remotes, list them, and modify them. You will learn to interact with remotes, push and pull data, check out specific versions, and fetch data to the cache.

Automate and evaluate

You will be motivated to automate ML pipelines, emphasizing modularization of code and the creation of a configuration file. You will be introduced to DVC pipelines as directed acyclic graphs, with hands-on experience in adding stages and their inputs and outputs. You will practice executing these pipelines efficiently to enable different use cases in machine learning model training. The course concludes with a focus on evaluation, showcasing how metrics and plots are tracked in DVC.
Für Unternehmen

Trainierst du 2 oder mehr?

Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.
DataCamp Für UnternehmenFür eine maßgeschneiderte Lösung buchen Sie eine Demo.

In den folgenden Tracks

IMachine Learning Engineer

Gehe zu Track

Machine Learning in der Produktion in Python

Gehe zu Track
  1. 1

    Introduction to DVC

    Kostenlos

    This chapter provides a comprehensive introduction to Data Version Control (DVC), a tool essential for data versioning in machine learning. Learners will explore the motivation behind data versioning, understand its differences from code versioning, and experiment with a simple classification problem. They will review basic Git commands, learn about DVC, and practice setting up a repository. The chapter concludes with an overview of DVC’s features and use cases, including versioning data and models, CI/CD for machine learning, experiment tracking, pipelines, and more.

    Kapitel Jetzt Abspielen
    Data Versioning Motivation
    50 xp
    Anatomy of a Machine Learning Model
    100 xp
    Differences Between Data and Code Versioning
    50 xp
    Understanding Hyperparameters
    50 xp
    Introduction to DVC
    50 xp
    Working with Git CLI
    100 xp
    Review DVC CLI
    50 xp
    DVC features and use cases
    50 xp
    DVC pipelines
    50 xp
    CI/CD for machine learning
    50 xp
  2. 2

    DVC Configuration and Data Management

    This chapter delves into the setup of DVC, encompassing aspects such as installation, initialization of the repository, and the utilization of the .dvcignore file. It further navigates through the exploration of DVC cache and staging files, imparting knowledge on how to add and remove files, manage caches, and comprehend the underlying mechanisms using the MD5 hash. The chapter also elucidates on DVC remotes, distinguishing them from Git remotes, and guides you on how to add, list, and modify them. Lastly, it teaches you how to interact with these remotes by pushing and pulling data, checking out specific versions, and fetching data to the cache.

    Kapitel Jetzt Abspielen
  3. 3

    Pipelines in DVC

    This chapter focuses on automating ML pipelines using DVC. Learners create a configuration file containing settings and hyperparameters. They also learn about pipeline visualization using directed acyclic graphs and use commands to describe dependencies, commands, and outputs. Execution of DVC pipelines is covered, including local model training and how Git tracks DVC metadata. Additionally, learners explore metrics and plots tracking in DVC, including how to print metrics, create plot files, and compare metrics and plots across different pipeline stages.

    Kapitel Jetzt Abspielen
Für Unternehmen

Trainierst du 2 oder mehr?

Verschaffen Sie Ihrem Team Zugriff auf die vollständige DataCamp-Plattform, einschließlich aller Funktionen.

In den folgenden Tracks

IMachine Learning Engineer

Gehe zu Track

Machine Learning in der Produktion in Python

Gehe zu Track

Mitwirkende

Collaborator's avatar
George Boorman
Collaborator's avatar
Arne Warnke
Collaborator's avatar
Katerina Zahradova

Voraussetzungen

Supervised Learning with scikit-learnIntroduction to Git
Ravi Bhadauria HeadshotRavi Bhadauria

Senior Machine Learning Engineer

Mehr Anzeigen

Was sagen andere Lernende?

Melden Sie sich an 15 Millionen Lernende und starten Sie Introduction to Data Versioning with DVC Heute!

Kostenloses Konto erstellen

GoogleLinkedInFacebook

oder

Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.