All Courses

Data Versioning and Experiment Tracking for Machine Learning

Chapter 1: The Need for Reproducibility in Machine Learning

Challenges in Managing ML Projects

Why Git Alone Is Not Sufficient

Defining Reproducibility in ML

Components of a Reproducible ML Workflow

Introduction to Data Versioning Concepts

Introduction to Experiment Tracking Concepts

Quiz for Chapter 1

Chapter 2: Versioning Data with DVC

Data Versioning Strategies

Introducing Data Version Control (DVC)

Setting Up DVC in a Project

Tracking Data Files and Directories

Storing and Retrieving Data Versions

Connecting DVC to Remote Storage (S3, GCS, Azure Blob)

Switching Between Data Versions

Hands-on Practical: Versioning a Dataset

Quiz for Chapter 2

Chapter 3: Tracking Experiments with MLflow

The Importance of Experiment Tracking

Introducing MLflow Tracking

Setting up MLflow

Logging Parameters and Metrics

Logging Artifacts (Models, Plots, Files)

Organizing Runs with Experiments

Using the MLflow UI

Comparing Experiment Runs

Practice: Tracking a Training Run

Chapter 4: Integrating DVC and MLflow for Reproducible Workflows

Connecting Data Versions to Experiments

Structuring Projects for Integration

Logging DVC Metadata in MLflow

Creating DVC Pipelines

Reproducing DVC Pipelines

Tracking DVC Pipeline Metrics

Combining DVC Pipelines and MLflow Tracking

Best Practices for Integrated Workflows

Hands-on Practical: Building an Integrated Pipeline

Data Versioning and Experiment Tracking for Machine Learning

Prerequisites: Python & ML Basics

Level:

Intermediate

What You'll Learn

Understand Core Concepts
Grasp the importance and principles of versioning data and tracking experiments in the ML lifecycle.
Implement Data Versioning
Utilize Data Version Control (DVC) to manage datasets, track changes, and ensure data reproducibility.
Implement Experiment Tracking
Employ MLflow Tracking to log parameters, metrics, code versions, and artifacts for ML experiments.
Integrate Tools
Combine data versioning and experiment tracking into a cohesive MLOps workflow.
Build Reproducible Pipelines
Structure ML projects for better reproducibility, collaboration, and debugging.

© 2025 ApX Machine Learning