docker rundocker-compose.ymldocker runTo execute a machine learning training process, docker run is the primary command for creating and running a container instance from a Docker image. This command enables the execution of your training script and all its dependencies within an isolated environment. Utilizing docker run ensures the training environment precisely matches the configuration specified in the Dockerfile, preventing variations from the host machine and enhancing reproducibility of results.
This section focuses on using the docker run command effectively to launch, manage, and configure containerized machine learning training jobs.
At its core, docker run creates and starts a new container from a specified image. If your Dockerfile includes an ENTRYPOINT or CMD instruction that points to your training script, running the container might be as simple as:
docker run your-ml-training-image:latest
This command instructs Docker to:
your-ml-training-image with the tag latest.ENTRYPOINT or CMD in the Dockerfile.Often, you'll want to provide specific arguments to your training script, such as hyperparameters or data paths, or even run a different script within the image. You can do this by appending the command and its arguments after the image name:
# Run train.py with specific arguments
docker run your-ml-training-image:latest python train.py --epochs 20 --batch-size 64
# Run a different script, e.g., data preprocessing
docker run your-ml-training-image:latest python preprocess_data.py --input /raw_data --output /processed_data
When you provide a command after the image name, it typically overrides the CMD instruction in the Dockerfile. If an ENTRYPOINT is defined, the command you provide might be passed as arguments to the ENTRYPOINT.
Training jobs are rarely self-contained; they need access to datasets and produce outputs like trained models, logs, or evaluation metrics. As discussed in Chapter 3, Docker volumes and bind mounts are the standard mechanisms for this.
You integrate these data management techniques using the -v or --mount flag with docker run.
Using Bind Mounts: Useful for development or when data resides directly on the host machine.
# Mount local ./data to /app/data in the container
# Mount local ./output to /app/output in the container
docker run \
-v $(pwd)/data:/app/data:ro \
-v $(pwd)/output:/app/output \
your-ml-training-image:latest \
python train.py --data-dir /app/data --model-dir /app/output/models
In this example, $(pwd)/data (the data directory in the current host working directory) is mounted read-only (:ro) inside the container at /app/data. The $(pwd)/output directory is mounted read-write at /app/output. The training script inside the container accesses data via /app/data and saves models to /app/output/models, which appear directly in the ./output/models directory on the host.
Using Docker Volumes: Preferred for managing persistent data independent of the host filesystem structure.
# Assume volumes 'training_data_v1' and 'model_artifacts_v1' exist
docker run \
-v training_data_v1:/app/data:ro \
-v model_artifacts_v1:/app/output \
your-ml-training-image:latest \
python train.py --data-dir /app/data --model-dir /app/output/models
Here, Docker-managed volumes are mounted into the container. This decouples data storage from the host's directory layout.
Remember to ensure the paths inside the container (/app/data, /app/output/models) match what your training script expects.
As covered previously, environment variables and command-line arguments are common ways to configure training jobs. docker run provides flags for both:
Environment Variables (-e or --env): Suitable for passing configuration like API keys, learning rates, or flags.
docker run \
-e LEARNING_RATE=0.005 \
-e NUM_EPOCHS=30 \
-e WANDB_API_KEY=your_secret_key \
-v ... \
your-ml-training-image:latest \
python train.py
# Assumes train.py reads LEARNING_RATE, NUM_EPOCHS from environment
Command-Line Arguments: Passed directly after the image name (or after the overriding command).
docker run \
-v ... \
your-ml-training-image:latest \
python train.py --learning-rate 0.005 --epochs 30 --log-to-wandb
The choice between them often depends on the training script's design. Environment variables are useful for secrets or settings that might apply across different script executions, while command-line arguments are explicit and often used for run-specific parameters like hyperparameters.
By default, docker run attaches your terminal to the container's standard input, output, and error streams. You'll see the training logs directly in your terminal, and the command prompt will block until the container exits. This is useful for short jobs or interactive debugging.
For long-running training jobs, you'll typically want to run the container in detached mode using the -d flag:
docker run -d \
--name long_training_run \
-v training_data_v1:/app/data:ro \
-v model_artifacts_v1:/app/output \
-e LEARNING_RATE=0.001 \
your-ml-training-image:latest \
python train.py --data-dir /app/data --model-dir /app/output/models
This command starts the container in the background and prints the container ID. Your terminal prompt returns immediately.
To view the logs of a detached container, use the docker logs command:
# Follow logs in real-time
docker logs -f long_training_run
# Show all existing logs
docker logs long_training_run
Machine learning training can be computationally intensive. docker run allows you to limit the resources a container can consume, preventing a single job from monopolizing host resources.
--cpus): Specify the number of CPU cores the container can use.--memory): Set a maximum amount of RAM.docker run -d \
--cpus="4" \
--memory="16g" \
--name resource_limited_training \
-v ... \
your-ml-training-image:latest \
python train.py ...
This ensures the container uses at most 4 CPU cores and 16 GB of RAM.
Naming Containers (--name): Assigning a memorable name makes it easier to manage containers (e.g., view logs, stop, remove) instead of relying on auto-generated IDs.
docker run -d --name my_experiment_run_001 ...
docker logs my_experiment_run_001
docker stop my_experiment_run_001
Automatic Cleanup (--rm): For training jobs that are typically run once and don't need to be inspected after completion, the --rm flag is very useful. It automatically removes the container's filesystem when the container exits. This prevents cluttering your system with stopped containers.
# Container will be removed automatically upon completion or error
docker run --rm \
--name transient_training \
-v ... \
your-ml-training-image:latest \
python train.py ...
Note: --rm cannot be used with -d. If you run a job detached and want it removed after completion, you'll need to remove it manually using docker rm <container_name_or_id>.
docker run Process for TrainingThe docker run command orchestrates several components to launch your training job within an isolated environment.
Diagram illustrating the flow when using
docker runfor a training job. The command initiates the process, instructing the Docker daemon to create a container from an image, mount requested data volumes or directories, inject environment variables, and execute the specified training script.
By mastering docker run with its various flags for data mounting, configuration, and lifecycle management, you gain precise control over how your ML training jobs execute, significantly improving consistency and simplifying the process of running experiments across different machines or cloud environments.
Was this section helpful?
docker run command, including command-line arguments, environment variables, resource allocation, and container lifecycle management, directly relevant to executing ML training jobs.© 2026 ApX Machine LearningEngineered with