Building upon the concept of DVC pipelines defined in dvc.yaml
, let's focus on a specific type of output: performance metrics. When a pipeline stage, such as model training or evaluation, completes, it often generates files containing key performance indicators like accuracy, precision, recall, loss, or business-specific measures. DVC provides a mechanism to explicitly track these metric files, linking them directly to the pipeline's execution state and the underlying Git commit.
You might wonder why we need DVC metric tracking when we have MLflow for comprehensive experiment logging. Tracking metrics within DVC pipelines offers several distinct advantages, particularly for workflow automation and quick comparisons tied to code and data changes:
dvc.yaml
are intrinsically linked to the specific stage that produced them and the exact inputs (data, code, dependencies) used in that run.dvc.yaml
and the generated dvc.lock
file are tracked by Git, comparing metrics across different Git commits becomes straightforward using DVC commands. This allows you to quickly assess the impact of code changes or different data versions on performance.dvc metrics show
, dvc metrics diff
) to view and compare metrics directly from your terminal, offering a faster feedback loop than navigating a UI for every small change.While MLflow excels at detailed experiment comparison, visualization, and artifact management across many potentially unrelated runs, DVC metrics provide a focused, version-controlled view tied directly to the pipeline structure managed within your Git repository.
dvc.yaml
To tell DVC which files contain metrics generated by a pipeline stage, you add a metrics
section to that stage's definition in dvc.yaml
. DVC expects these files to be in a simple format it can parse, such as JSON, YAML, CSV, or TSV (Tab-Separated Values).
Consider a typical training stage defined using dvc stage add
or dvc run
. Let's assume this stage runs a script train.py
which, upon completion, writes evaluation results to a file named results/metrics.json
.
Here’s how you would modify the stage definition in dvc.yaml
to declare this file as a metrics output:
stages:
train:
cmd: python src/train.py --data data/processed --model-out models/model.pkl --metrics-out results/metrics.json
deps:
- src/train.py
- data/processed
params:
- training.epochs
- training.learning_rate
outs:
- models/model.pkl
metrics: # Declare the metrics file here
- results/metrics.json:
cache: false # Typically, metrics files are small and don't need DVC caching
In this example:
cmd
executes the training script, which takes an argument --metrics-out
specifying where to save the metrics.metrics
key lists the files DVC should track as metrics outputs for this stage.cache: false
is often used for metrics files. Since they are usually small text files and their content is the primary information (rather than needing efficient storage/transfer like large datasets), disabling DVC's default caching mechanism simplifies things. The file will still be tracked by Git if it's not in .gitignore
. If you do want DVC to cache and version the metrics file itself (useful if it's large or binary, though less common), you can omit cache: false
or set it to true
.The results/metrics.json
file might look something like this:
{
"accuracy": 0.875,
"precision": 0.85,
"recall": 0.90,
"f1_score": 0.874,
"validation_loss": 0.25
}
DVC can parse this simple key-value structure. For CSV/TSV files, it expects a header row and will treat each column as a metric.
Once you have defined metrics in dvc.yaml
and run the pipeline using dvc repro
, you can use DVC commands to inspect the results.
Show Current Metrics: The dvc metrics show
command displays the metrics from the files specified in dvc.yaml
for your current workspace and recent Git commits.
$ dvc metrics show
Path accuracy precision recall f1_score validation_loss
results/metrics.json 0.875 0.85 0.90 0.874 0.25
# Output might also show metrics from previous commits if available
# (e.g., from 'main' branch or specific tags)
Compare Metrics Between Revisions: The real power comes from comparison. dvc metrics diff
compares the metrics between your current workspace and a specific Git revision (like a branch, tag, or commit hash), or between two revisions.
# Compare workspace against the main branch
$ dvc metrics diff main
Path Metric main workspace Change
results/metrics.json accuracy 0.860 0.875 0.015
results/metrics.json precision 0.84 0.85 0.01
results/metrics.json recall 0.88 0.90 0.02
results/metrics.json f1_score 0.859 0.874 0.015
results/metrics.json validation_loss 0.28 0.25 -0.03
# Compare two specific commits or tags
$ dvc metrics diff v1.0 v1.1
This command immediately highlights how performance changed due to modifications tracked between the specified Git revisions.
While dvc metrics show
and dvc metrics diff
provide tabular data, visualizing trends can be more insightful. DVC offers basic plotting capabilities via dvc plots
to track how metrics change across your Git history.
DVC can automatically generate plots if your metrics files follow certain structures (like CSV/TSV or JSON lists of objects). You can also define custom plot configurations in dvc.yaml
using Vega-Lite templates for more control.
For a simple case, if your results/metrics.json
consistently contains the same keys across commits, you can generate plots comparing revisions:
# Show changes in accuracy between main and workspace using default plots
$ dvc plots diff main -o accuracy_changes.html --show-vega
# More commonly, define plot configurations in dvc.yaml for easier rendering
Adding a plots
section to dvc.yaml
allows defining named plots:
plots:
# Simple plot using default template, showing accuracy across revisions
- results/metrics.json:
x: revision # Special value for Git revision
y: accuracy
title: Accuracy Trend
template: linear # Use linear plot template
# Plot comparing multiple metrics using specific properties
- id: performance_metrics
template: linear
x: revision
y:
results/metrics.json: [accuracy, f1_score] # Plot accuracy and F1
title: Key Performance Metrics Over Revisions
x_label: Git Revision
y_label: Score
With such definitions, you can use commands like dvc plots show performance_metrics
or dvc plots diff main performance_metrics
to generate visualizations (often outputting HTML files or JSON specifications for rendering).
DVC metrics and plots link metric file versions directly to Git commits, enabling tracking of performance changes alongside code and pipeline structure evolution.
While DVC provides valuable capabilities for tracking and comparing metrics tied to pipeline runs and Git history, it focuses primarily on the outputs of defined, reproducible stages. For a more comprehensive view encompassing hyperparameter tuning, arbitrary script runs, detailed artifact logging, and a richer UI for exploration, MLflow remains the tool of choice. The next section explores how to bridge these two systems, ensuring that the metrics tracked by your DVC pipeline are also captured within your MLflow experiments for unified analysis.
© 2025 ApX Machine Learning