Once you start logging multiple runs for an experiment, the next logical step is to compare them to understand how different parameters influenced the results and identify the most promising models. MLflow provides excellent tools for this analysis, both visually through its UI and programmatically using its API. Comparing runs is fundamental to iterating effectively on your models.
The MLflow UI is often the quickest way to get an overview and compare different attempts within an experiment.
Navigate to Your Experiment: Launch the MLflow UI (usually by running mlflow ui
in your terminal) and navigate to the specific experiment containing the runs you want to compare. You'll see a table listing all the runs, showing some default parameters and metrics.
Select Runs for Comparison: Check the boxes next to the runs you are interested in comparing.
Click "Compare": Once you've selected two or more runs, a "Compare" button will become active near the top of the table. Clicking this button takes you to a dedicated comparison view.
The comparison view offers several ways to analyze the selected runs:
Parameter Comparison: A table highlights the differences in parameters across the selected runs. Parameters that are identical across all selected runs are typically hidden by default to focus on what changed. This is useful for quickly seeing which hyperparameter adjustments were made.
Metric Comparison: Similarly, a table displays the logged metrics for each selected run. This allows for direct comparison of performance indicators like accuracy, loss, precision, or recall.
Parallel Coordinates Plot: This plot provides a multi-dimensional view, visualizing how different parameter combinations relate to outcomes (metrics). Each run is represented by a line that passes through axes representing selected parameters and metrics. It can help identify correlations and trade-offs. For example, you might see that lower learning rates tend to lead to higher accuracy, but only up to a certain point.
Scatter Plots: You can generate scatter plots to explore the relationship between any two parameters, a parameter and a metric, or two metrics. For instance, plotting learning_rate
against validation_accuracy
can reveal how this specific hyperparameter impacts performance for the selected runs.
Comparing validation accuracy achieved with different learning rates across several runs.
Using these UI features, you can filter runs based on parameters or metrics, sort them, and visually inspect the relationships between inputs (parameters, code versions) and outputs (metrics, artifacts). This interactive exploration is invaluable for building intuition about your model's behavior.
While the UI is great for interactive exploration, you might need to perform more complex analysis, automate comparisons, or integrate comparison results into reports. The MLflow Python API provides the mlflow.search_runs
function for this purpose.
mlflow.search_runs
allows you to query runs within one or more experiments based on various criteria. You can filter runs using a SQL-like query language applied to parameters, metrics, and run attributes (like tags or start time).
Here’s how you can use it:
import mlflow
import pandas as pd
# Ensure you are connected to the correct MLflow tracking server if not local
# mlflow.set_tracking_uri("http://your-mlflow-server:5000")
# Specify the experiment ID(s) you want to search within
# You can get experiment IDs from the UI or using mlflow.get_experiment_by_name
experiment_name = "MNIST Classification"
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_ids = [experiment.experiment_id] # Can be a list of IDs
# Define a filter string (optional)
# Example: Find runs with learning_rate > 0.001 and accuracy > 0.9
filter_string = "params.learning_rate > '0.001' and metrics.accuracy > 0.9"
# Define ordering (optional)
# Example: Order by accuracy descending
order_by = ["metrics.accuracy DESC"]
# Fetch the runs into a Pandas DataFrame
runs_df = mlflow.search_runs(
experiment_ids=experiment_ids,
filter_string=filter_string,
order_by=order_by
)
# Display the first few rows and selected columns
# Note: Parameters are prefixed with 'params.', metrics with 'metrics.', tags with 'tags.'
print(runs_df[['run_id', 'params.learning_rate', 'params.epochs', 'metrics.accuracy', 'metrics.loss']].head())
# Perform further analysis with Pandas
if not runs_df.empty:
best_run = runs_df.iloc[0] # Because we ordered by accuracy DESC
print(f"\nBest Run ID: {best_run['run_id']}")
print(f" Learning Rate: {best_run['params.learning_rate']}")
print(f" Epochs: {best_run['params.epochs']}")
print(f" Accuracy: {best_run['metrics.accuracy']:.4f}")
print(f" Loss: {best_run['metrics.loss']:.4f}")
else:
print("No runs found matching the criteria.")
This code snippet demonstrates how to:
Using mlflow.search_runs
gives you the flexibility to integrate experiment comparison directly into your analysis scripts or automated reporting workflows. You can easily compare hundreds or thousands of runs programmatically to identify trends, optimal parameter ranges, or the single best performing model according to your defined criteria.
By systematically logging your experiments and leveraging MLflow's comparison tools, you move from ad-hoc trial-and-error to a more structured, data-driven approach to model development. This significantly improves your ability to understand results, iterate faster, and reproduce successful outcomes.
© 2025 ApX Machine Learning