While logging hyperparameters and performance metrics gives you a quantitative overview of your experiment runs, it often doesn't tell the whole story. To fully understand and reproduce an experiment, you frequently need access to the outputs generated during the run. These outputs, known as artifacts in MLflow, can include a wide range of files such as the trained model itself, visualizations of performance, evaluation reports, or even subsets of processed data.
MLflow provides simple mechanisms to save these outputs alongside your logged parameters and metrics, associating them directly with the specific run that produced them. This ensures that all components needed to evaluate or recreate a result are stored together.
Think of artifacts as any file output produced by your machine learning code that you want to save for a given run. Common examples include:
.pt
files).Logging these artifacts is important for several reasons:
The most fundamental way to log an artifact is using the mlflow.log_artifact()
function. This function takes a path to a local file and saves a copy to the run's artifact repository.
Let's say you've generated a plot using Matplotlib and saved it to a file named confusion_matrix.png
. You can log it like this:
import mlflow
import matplotlib.pyplot as plt
# Assuming 'model', 'X_test', 'y_test' are defined
# and you have a function plot_confusion_matrix
# Generate and save the plot
fig, ax = plt.subplots()
# plot_confusion_matrix(model, X_test, y_test, ax=ax) # Your plotting logic
ax.set_title("Confusion Matrix")
plt.savefig("confusion_matrix.png")
plt.close(fig) # Close the plot to free memory
# Start an MLflow run (or use an existing one)
with mlflow.start_run():
# Log parameters and metrics as usual...
mlflow.log_param("solver", "liblinear")
mlflow.log_metric("accuracy", 0.85)
# Log the plot file as an artifact
mlflow.log_artifact("confusion_matrix.png")
print(f"Run ID: {mlflow.active_run().info.run_id}")
print("Artifact 'confusion_matrix.png' logged.")
After this code runs, the confusion_matrix.png
file will be copied to the run's artifact location (typically within the mlruns
directory locally, or in configured remote storage). You can specify an optional artifact_path
argument to organize artifacts within subdirectories in the artifact store. For instance, mlflow.log_artifact("confusion_matrix.png", artifact_path="plots")
would place the file inside a plots
folder within the run's artifact view.
Sometimes you need to log multiple files, perhaps contained within a directory. For example, you might have a directory containing several plots or configuration files. Instead of logging each file individually, you can use mlflow.log_artifacts()
(note the plural 's').
import mlflow
import os
# Create a directory with some dummy files
os.makedirs("run_outputs", exist_ok=True)
with open("run_outputs/config.yaml", "w") as f:
f.write("learning_rate: 0.01\n")
with open("run_outputs/feature_importances.txt", "w") as f:
f.write("feature1: 0.8\nfeature2: 0.2\n")
# Start an MLflow run
with mlflow.start_run():
mlflow.log_param("feature_set", "v2")
mlflow.log_metric("auc", 0.78)
# Log the entire directory
mlflow.log_artifacts("run_outputs", artifact_path="outputs")
print(f"Run ID: {mlflow.active_run().info.run_id}")
print("Directory 'run_outputs' logged to 'outputs'.")
# Clean up local dummy files/directory if needed
# import shutil
# shutil.rmtree("run_outputs")
This command logs the entire contents of the local run_outputs
directory into the outputs
subdirectory within the run's artifact store.
While you can log model files using mlflow.log_artifact()
, MLflow provides specialized functions for logging models that offer significant advantages. These functions, available through framework-specific flavors like mlflow.sklearn
, mlflow.pytorch
, mlflow.tensorflow
, etc., not only save the model file but also include extra metadata:
conda.yaml
or requirements.txt
file.MLmodel
file: An important metadata file that describes the model, its flavor, and how to load and use it.Using these functions makes models more self-contained and easier to reuse or deploy later using MLflow's model serving or registry tools.
Here's an example using mlflow.sklearn.log_model()
:
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
# Generate sample data
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a simple model
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# Start an MLflow run
with mlflow.start_run() as run:
# Log parameters and metrics
mlflow.log_param("model_type", "LogisticRegression")
mlflow.log_metric("accuracy", accuracy)
# Log the scikit-learn model
# 'sk_model' is the directory name within the artifact store
# 'registered_model_name' is optional for registering the model
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="sk_model",
# input_example=X_train[:5], # Optional: Log input schema example
# signature=mlflow.models.infer_signature(X_train, model.predict(X_train)) # Optional: Log input/output signature
)
print(f"Run ID: {run.info.run_id}")
print("Model logged to artifact path 'sk_model'.")
print(f"Accuracy: {accuracy:.4f}")
When you log a model this way, MLflow creates a directory (e.g., sk_model
) in the artifact store containing the serialized model file (model.pkl
in this case), a conda.yaml
file, a requirements.txt
file, and the MLmodel
metadata file. This packaging makes the model significantly easier to manage and redeploy compared to just saving the raw .pkl
file.
Once artifacts are logged, you can easily access them through the MLflow UI. Navigate to the specific run's page, and you'll find an "Artifacts" section. This section presents a file browser interface where you can navigate the directories and view or download the logged files. Images logged as artifacts are often rendered directly in the UI, allowing quick visual inspection of plots like the confusion matrix example earlier. Models logged using log_model
functions will show the MLmodel
file and associated dependency files.
Here's a simple example of how a logged plot might appear visualized (conceptually, as the UI renders the image):
Example visualization of an ROC curve artifact as might be seen or generated from data within the MLflow UI.
Logging artifacts is a fundamental part of effective experiment tracking. It ensures that the essential outputs of your work, models, plots, configurations, and reports, are preserved and linked directly to the specific code version, parameters, and metrics that produced them, greatly enhancing your ability to analyze, reproduce, and build upon your results.
© 2025 ApX Machine Learning