DVC pipelines are defined in dvc.yaml. These pipelines generate various outputs, and performance metrics represent a primary type of output. When a pipeline stage, such as model training or evaluation, completes, it often generates files containing important performance indicators like accuracy, precision, recall, loss, or business-specific measures. DVC provides a mechanism to explicitly track these metric files, linking them directly to the pipeline's execution state and the underlying Git commit.
You might wonder why we need DVC metric tracking when we have MLflow for comprehensive experiment logging. Tracking metrics within DVC pipelines offers several distinct advantages, particularly for workflow automation and quick comparisons tied to code and data changes:
dvc.yaml are intrinsically linked to the specific stage that produced them and the exact inputs (data, code, dependencies) used in that run.dvc.yaml and the generated dvc.lock file are tracked by Git, comparing metrics across different Git commits becomes straightforward using DVC commands. This allows you to quickly assess the impact of code changes or different data versions on performance.dvc metrics show, dvc metrics diff) to view and compare metrics directly from your terminal, offering a faster feedback loop than navigating a UI for every small change.While MLflow excels at detailed experiment comparison, visualization, and artifact management across many potentially unrelated runs, DVC metrics provide a focused, version-controlled view tied directly to the pipeline structure managed within your Git repository.
dvc.yamlTo tell DVC which files contain metrics generated by a pipeline stage, you add a metrics section to that stage's definition in dvc.yaml. DVC expects these files to be in a simple format it can parse, such as JSON, YAML, CSV, or TSV (Tab-Separated Values).
Consider a typical training stage defined using dvc stage add or dvc run. Let's assume this stage runs a script train.py which, upon completion, writes evaluation results to a file named results/metrics.json.
Here’s how you would modify the stage definition in dvc.yaml to declare this file as a metrics output:
stages:
train:
cmd: python src/train.py --data data/processed --model-out models/model.pkl --metrics-out results/metrics.json
deps:
- src/train.py
- data/processed
params:
- training.epochs
- training.learning_rate
outs:
- models/model.pkl
metrics: # Declare the metrics file here
- results/metrics.json:
cache: false # Typically, metrics files are small and don't need DVC caching
In this example:
cmd executes the training script, which takes an argument --metrics-out specifying where to save the metrics.metrics key lists the files DVC should track as metrics outputs for this stage.cache: false is often used for metrics files. Since they are usually small text files and their content is the primary information (rather than needing efficient storage/transfer like large datasets), disabling DVC's default caching mechanism simplifies things. The file will still be tracked by Git if it's not in .gitignore. If you do want DVC to cache and version the metrics file itself (useful if it's large or binary, though less common), you can omit cache: false or set it to true.The results/metrics.json file might look something like this:
{
"accuracy": 0.875,
"precision": 0.85,
"recall": 0.90,
"f1_score": 0.874,
"validation_loss": 0.25
}
DVC can parse this simple key-value structure. For CSV/TSV files, it expects a header row and will treat each column as a metric.
Once you have defined metrics in dvc.yaml and run the pipeline using dvc repro, you can use DVC commands to inspect the results.
Show Current Metrics: The dvc metrics show command displays the metrics from the files specified in dvc.yaml for your current workspace and recent Git commits.
$ dvc metrics show
Path accuracy precision recall f1_score validation_loss
results/metrics.json 0.875 0.85 0.90 0.874 0.25
# Output might also show metrics from previous commits if available
# (e.g., from 'main' branch or specific tags)
Compare Metrics Between Revisions: The real power comes from comparison. dvc metrics diff compares the metrics between your current workspace and a specific Git revision (like a branch, tag, or commit hash), or between two revisions.
# Compare workspace against the main branch
$ dvc metrics diff main
Path Metric main workspace Change
results/metrics.json accuracy 0.860 0.875 0.015
results/metrics.json precision 0.84 0.85 0.01
results/metrics.json recall 0.88 0.90 0.02
results/metrics.json f1_score 0.859 0.874 0.015
results/metrics.json validation_loss 0.28 0.25 -0.03
# Compare two specific commits or tags
$ dvc metrics diff v1.0 v1.1
This command immediately highlights how performance changed due to modifications tracked between the specified Git revisions.
While dvc metrics show and dvc metrics diff provide tabular data, visualizing trends can be more insightful. DVC offers basic plotting capabilities via dvc plots to track how metrics change across your Git history.
DVC can automatically generate plots if your metrics files follow certain structures (like CSV/TSV or JSON lists of objects). You can also define custom plot configurations in dvc.yaml using Vega-Lite templates for more control.
For a simple case, if your results/metrics.json consistently contains the same keys across commits, you can generate plots comparing revisions:
# Show changes in accuracy between main and workspace using default plots
$ dvc plots diff main -o accuracy_changes.html --show-vega
# More commonly, define plot configurations in dvc.yaml for easier rendering
Adding a plots section to dvc.yaml allows defining named plots:
plots:
# Simple plot using default template, showing accuracy across revisions
- results/metrics.json:
x: revision # Special value for Git revision
y: accuracy
title: Accuracy Trend
template: linear # Use linear plot template
# Plot comparing multiple metrics using specific properties
- id: performance_metrics
template: linear
x: revision
y:
results/metrics.json: [accuracy, f1_score] # Plot accuracy and F1
title: Performance Metrics Over Revisions
x_label: Git Revision
y_label: Score
With such definitions, you can use commands like dvc plots show performance_metrics or dvc plots diff main performance_metrics to generate visualizations (often outputting HTML files or JSON specifications for rendering).
DVC metrics and plots link metric file versions directly to Git commits, enabling tracking of performance changes alongside code and pipeline structure evolution.
While DVC provides valuable capabilities for tracking and comparing metrics tied to pipeline runs and Git history, it focuses primarily on the outputs of defined, reproducible stages. For a more comprehensive view including hyperparameter tuning, arbitrary script runs, detailed artifact logging, and a richer UI for exploration, MLflow remains the tool of choice. The next section shows how to bridge these two systems, ensuring that the metrics tracked by your DVC pipeline are also captured within your MLflow experiments for unified analysis.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with