All Courses

Data Versioning and Experiment Tracking for Machine Learning

Chapter 1: The Need for Reproducibility in Machine Learning

Challenges in Managing ML Projects

Why Git Alone Is Not Sufficient

Defining Reproducibility in ML

Components of a Reproducible ML Workflow

Introduction to Data Versioning Concepts

Introduction to Experiment Tracking Concepts

Quiz for Chapter 1

Chapter 2: Versioning Data with DVC

Data Versioning Strategies

Introducing Data Version Control (DVC)

Setting Up DVC in a Project

Tracking Data Files and Directories

Storing and Retrieving Data Versions

Connecting DVC to Remote Storage (S3, GCS, Azure Blob)

Switching Between Data Versions

Hands-on Practical: Versioning a Dataset

Quiz for Chapter 2

Chapter 3: Tracking Experiments with MLflow

The Importance of Experiment Tracking

Introducing MLflow Tracking

Setting up MLflow

Logging Parameters and Metrics

Logging Artifacts (Models, Plots, Files)

Organizing Runs with Experiments

Using the MLflow UI

Comparing Experiment Runs

Practice: Tracking a Training Run

Chapter 4: Integrating DVC and MLflow for Reproducible Workflows

Connecting Data Versions to Experiments

Structuring Projects for Integration

Logging DVC Metadata in MLflow

Creating DVC Pipelines

Reproducing DVC Pipelines

Tracking DVC Pipeline Metrics

Combining DVC Pipelines and MLflow Tracking

Best Practices for Integrated Workflows

Hands-on Practical: Building an Integrated Pipeline

Data Versioning Strategies

Was this section helpful?

© 2025 ApX Machine Learning

Strategies for Data Versioning in ML