As introduced earlier, machine learning development is an iterative process. You might try dozens or even hundreds of variations involving different algorithms, feature sets, data preprocessing steps, and hyperparameter values. Keeping track of what worked, what didn't, and exactly how a specific result was generated becomes challenging very quickly. Relying on manual notes, spreadsheets, or complex file naming conventions is often error-prone and difficult to scale.
This is where MLflow Tracking comes in. It's a component of the open-source MLflow platform designed specifically to address the challenge of managing the machine learning lifecycle, with a focus on logging and organizing experiments. Instead of manually recording details, you instrument your code to automatically capture the essential information about each training execution.
MLflow Tracking organizes your work around a few central ideas. Understanding these concepts is fundamental to using the tool effectively:
Run: A single execution of your model training code (or any piece of data science code you want to track). Each time you run your training script, you typically initiate a new MLflow Run. MLflow assigns a unique ID to each run.
Parameters: These are the input settings for a run. Think of them as the configuration or hyperparameters you want to record. Examples include the learning rate of an optimizer, the number of layers in a neural network, the value of a regularization parameter (like C in SVM), or the path to the input dataset version (perhaps managed by DVC). Logging parameters allows you to know exactly what configuration produced a specific result.
Metrics: These are quantitative outputs or results you want to measure and compare across runs. Metrics are typically numeric values that evaluate the performance of your model, such as accuracy, precision, recall, F1-score, Mean Squared Error (MSE), or Area Under the Curve (AUC). MLflow allows you to log metrics at the end of a run or even multiple times throughout a run (e.g., logging the training loss after each epoch). This is particularly useful for observing model convergence. Metrics are stored with timestamps.
Artifacts: These are output files associated with a run. Artifacts can be anything: a serialized model file (like a pickled scikit-learn model or a saved TensorFlow/PyTorch model), images (like performance plots or data visualizations), data files (like processed features or model predictions), or even text files containing logs or notes. MLflow stores these files, allowing you to retrieve the exact outputs generated by a specific run.
Source Code Version: To ensure full reproducibility, MLflow can automatically record the version of the code used for a run. If your project is managed with Git, MLflow typically logs the Git commit hash. This links the specific code state to the parameters, metrics, and artifacts of the run.
Experiment: An experiment is a way to group related runs. Think of it as a workspace for a specific task or project, such as "Predicting Customer Churn" or "Optimizing ResNet50 Hyperparameters". All runs are logged within the context of an experiment. If you don't specify one, MLflow uses a default experiment.
The following diagram illustrates how these components relate: An Experiment contains multiple Runs. Each Run executes some code (identified by its version), uses specific Parameters, produces Metrics, and generates Artifacts.
Relationship between MLflow Tracking components: Experiments group Runs, and each Run logs Parameters, Metrics, Artifacts, and is linked to a Code Version.
MLflow Tracking consists of two main parts:
MLflow Client (API/SDK): This is how you interact with MLflow from your code. MLflow provides libraries for Python, R, Java, and a REST API. You use functions like mlflow.log_param()
, mlflow.log_metric()
, and mlflow.log_artifact()
within your scripts to send information about your run to the tracking backend.
Tracking Server & Backend Storage: This is where the information logged by the client is stored and managed. MLflow supports several backend configurations:
mlruns
directory. This is simple for getting started but less suitable for collaboration or remote execution.The Tracking Server also provides a web-based User Interface (UI), which allows you to browse, search, and compare experiments and runs visually.
By adopting MLflow Tracking, you gain a systematic way to record your ML experiments. This brings numerous benefits:
In the following sections, we'll explore the practical aspects of setting up MLflow and using its API to instrument your training code.
© 2025 ApX Machine Learning