Now that we've covered the concepts behind MLflow Tracking, let's put them into practice. This hands-on exercise will guide you through instrumenting a basic machine learning training script to log parameters, metrics, and a model artifact using the MLflow Python API. We'll use a familiar example from scikit-learn
to keep the focus squarely on the tracking process.
Make sure you have MLflow and scikit-learn installed (pip install mlflow scikit-learn
).
First, let's look at a simple script that trains a Logistic Regression model on the Iris dataset without any MLflow tracking. This serves as our starting point.
# baseline_train.py
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Define model parameters (these are what we'll track)
solver = 'liblinear'
C = 1.0 # Inverse of regularization strength
random_state = 42
# Train the model
model = LogisticRegression(solver=solver, C=C, random_state=random_state)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Parameters:")
print(f" solver: {solver}")
print(f" C: {C}")
print(f" random_state: {random_state}")
print(f"Metrics:")
print(f" Accuracy: {accuracy:.4f}")
# In a real scenario, you might save the model here
# joblib.dump(model, 'model.joblib')
Running this script (python baseline_train.py
) simply prints the parameters and the final accuracy to the console. If you run it again with different parameters, the previous results are lost unless you manually record them somewhere. This is exactly the problem MLflow solves.
Now, let's modify the script to use MLflow Tracking.
import mlflow
and import mlflow.sklearn
at the beginning.with mlflow.start_run():
block to encapsulate the training and evaluation logic. Everything logged within this block will belong to a single MLflow run.mlflow.log_param()
to record the hyperparameters used for this specific run (e.g., solver
, C
, random_state
).mlflow.log_metric()
to record the evaluation results (e.g., accuracy
).mlflow.sklearn.log_model()
to save the trained scikit-learn
model as an artifact associated with the run. This function handles the serialization and saves necessary metadata.Here is the modified script:
# mlflow_train.py
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Import MLflow
import mlflow
import mlflow.sklearn
# Optional: Set an experiment name. If it doesn't exist, it will be created.
# If you don't set this, runs will go to the 'Default' experiment.
mlflow.set_experiment("Iris Classification")
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Define model parameters
solver = 'liblinear'
C = 1.0
random_state = 42
# Start an MLflow run
with mlflow.start_run():
print("Starting MLflow run...")
# Log parameters
mlflow.log_param("solver", solver)
mlflow.log_param("C", C)
mlflow.log_param("random_state", random_state)
print(f" Logged parameters: solver={solver}, C={C}, random_state={random_state}")
# Train the model
model = LogisticRegression(solver=solver, C=C, random_state=random_state)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
# Log metrics
mlflow.log_metric("accuracy", accuracy)
print(f" Logged metric: accuracy={accuracy:.4f}")
# Log the trained model artifact
# 'model' is the directory name within the run's artifact store
# 'iris_logistic_regression' is the registered model name (optional, for Model Registry)
mlflow.sklearn.log_model(model, "model", registered_model_name="iris_logistic_regression")
print(" Logged model artifact")
print("MLflow run finished.")
print("Script execution complete.")
Now, execute the instrumented script from your terminal:
python mlflow_train.py
You will see output similar to the baseline script, but with additional messages indicating MLflow logging. Notice that by default, MLflow creates a local mlruns
directory in the same location where you ran the script. This directory stores the metadata and artifacts for your runs.
To visualize and compare your runs, launch the MLflow Tracking UI. Navigate to the directory containing your mlruns
folder in the terminal and run:
mlflow ui
This command starts a local web server, typically at http://127.0.0.1:5000
. Open this URL in your web browser.
In the UI, you should see:
mlflow.set_experiment
).solver
, C
, and random_state
values you logged.accuracy
score will be displayed. You can also view plots if you log metrics over time (e.g., loss per epoch).model.pkl
, conda.yaml
, python_env.yaml
, and MLmodel
).The real benefit comes from tracking multiple runs. Try modifying the mlflow_train.py
script:
C
(e.g., C = 0.5
).solver
(e.g., solver = 'saga'
, but note you might need more max_iter
for convergence).Run the script again after each change. Then, refresh the MLflow UI. You will see new runs listed in the table. You can now:
params.C = "0.5"
or metrics.accuracy > 0.95
).This hands-on exercise demonstrated the core workflow of MLflow Tracking: instrumenting your training code to log parameters, metrics, and artifacts, and then using the UI to review and compare your experiments. By adopting this practice, you create a systematic record of your modeling efforts, significantly improving reproducibility and making it easier to iterate on your models effectively.
© 2025 ApX Machine Learning