Now that you understand why systematically tracking your machine learning experiments is valuable, let's get practical and set up MLflow. Fortunately, getting started with MLflow is straightforward.
MLflow is a Python package and can be installed using pip, the standard Python package installer. Open your terminal or command prompt and run:
pip install mlflow
This command installs the core MLflow library, including the Tracking component, which is our focus in this chapter. Depending on your project's needs, you might later install additional MLflow extras (like mlflow[extras]
for database backends or specific ML library integrations), but the base installation is sufficient for now.
To verify the installation, you can run:
mlflow --version
This should print the installed MLflow version number.
MLflow Tracking is designed to be flexible, allowing you to log runs in different ways depending on your setup and requirements. When your Python script uses MLflow logging functions (like mlflow.log_param()
or mlflow.log_metric()
, which we'll cover next), it needs to know where to send this information. This destination is called the tracking backend. MLflow supports several backend types:
Local Filesystem (Default): This is the simplest option and the default behavior if you don't configure anything else. When you run an MLflow-instrumented script, MLflow will automatically create a directory named mlruns
in the current working directory where your script is executed. Inside mlruns
, it stores run metadata, parameters, metrics, and artifacts in a structured file format. This is great for individual experimentation and getting started quickly.
Local Tracking Server: You can run a local MLflow tracking server. This provides a dedicated user interface (UI) accessible via your web browser to view and compare runs, even if they originated from different directories or projects on your machine. The server still stores data on the local filesystem by default (using a specified directory), but centralizes access.
Remote Tracking Server: For collaboration or more persistent storage, you can configure MLflow to log to a remote tracking server. This server can store tracking data in a more robust backend like a relational database (e.g., PostgreSQL, MySQL) and artifacts in shared storage (like S3, Azure Blob Storage, GCS, or NFS). This setup is common in team environments or MLOps pipelines.
Let's look at how to use the first two options.
As mentioned, this requires no explicit setup beyond installing MLflow. If you run a Python script containing MLflow logging commands, it will create the mlruns
directory automatically.
# Example: simple_script.py
import mlflow
import os
# No server setup needed, MLflow logs to ./mlruns by default
mlflow.start_run()
# Log some example data
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.95)
# Create a dummy artifact file
os.makedirs("outputs", exist_ok=True)
with open("outputs/model.txt", "w") as f:
f.write("This is a dummy model file.")
mlflow.log_artifact("outputs/model.txt")
mlflow.end_run()
print("MLflow run logged to local './mlruns' directory.")
If you execute python simple_script.py
, you'll find a new mlruns
directory has appeared.
To view these local runs in the MLflow UI, navigate to the directory containing the mlruns
folder in your terminal and run:
mlflow ui
This command starts a local web server (usually at http://127.0.0.1:5000
or http://localhost:5000
) that serves the MLflow Tracking UI. Open this address in your browser, and you'll see the results of your script runs. Press Ctrl+C
in the terminal to stop the UI server.
While mlflow ui
reads directly from mlruns
, sometimes you want a persistent server process that can accept runs from scripts running anywhere on your machine (or even other machines on your local network).
First, start the tracking server. Choose a directory where you want the server to store its data (this will contain the mlruns
equivalent). Let's use a directory named mlflow_server_data
.
# Create a directory for the server's data
mkdir mlflow_server_data
# Start the MLflow server
# --backend-store-uri: Specifies where to store run metadata, metrics, params, tags.
# --default-artifact-root: Specifies where to store artifacts (like models, files).
# --host: The network interface to listen on (127.0.0.1 is localhost)
# --port: The port to use (default is 5000, using 8080 here)
mlflow server \
--backend-store-uri ./mlflow_server_data \
--default-artifact-root ./mlflow_server_data \
--host 127.0.0.1 \
--port 8080
This command starts the server process in your terminal. It will keep running until you stop it (e.g., with Ctrl+C
).
Now, you need to tell your Python script to log to this server instead of the local filesystem. You do this by setting the tracking URI using the mlflow.set_tracking_uri()
function before starting a run.
# Example: script_log_to_server.py
import mlflow
import os
# Set the tracking URI to point to the running server
mlflow.set_tracking_uri("http://127.0.0.1:8080")
# Now, when we start a run, it logs to the server
mlflow.start_run()
mlflow.log_param("learning_rate", 0.02)
mlflow.log_metric("accuracy", 0.98)
# Create another dummy artifact
os.makedirs("outputs", exist_ok=True)
with open("outputs/advanced_model.txt", "w") as f:
f.write("This is another dummy model file.")
mlflow.log_artifact("outputs/advanced_model.txt")
mlflow.end_run()
print("MLflow run logged to server at http://127.0.0.1:8080")
If you run python script_log_to_server.py
(while the mlflow server
command is running in another terminal), the run data will be sent to the server and stored in the mlflow_server_data
directory. You can view these runs by opening http://127.0.0.1:8080
in your web browser.
Alternatively, you can set the tracking URI using the MLFLOW_TRACKING_URI
environment variable, which often fits better into automated workflows:
export MLFLOW_TRACKING_URI="http://127.0.0.1:8080"
python your_training_script.py
If this environment variable is set, mlflow.start_run()
will automatically use it, and you don't need mlflow.set_tracking_uri()
in your code.
Different MLflow Tracking Setup Configurations. The Python script can log directly to a local
mlruns
directory, to a localmlflow server
process which centralizes storage, or to a remote server typically used in team or production environments.
mlruns
) and using mlflow ui
is often the easiest way to get started.mlflow server
pointed at a shared data directory is a good next step.For the exercises in this chapter, logging to the local filesystem or using a local mlflow server
will be sufficient. Now that MLflow is installed and you understand the basic setup options, let's move on to instrumenting your training code to actually log useful information.
© 2025 ApX Machine Learning