docker rundocker-compose.ymlRunning individual training jobs using docker run offers significant benefits for reproducibility. However, many machine learning workflows involve more than just a single training script container. You might need a database to fetch training data or log metrics, a message queue to trigger jobs, or perhaps a separate service to manage experiment configurations. Managing the lifecycle, networking, and data persistence for multiple interconnected containers using only docker run commands quickly becomes cumbersome and error-prone.
This is where Docker Compose comes into play. Docker Compose is a tool specifically designed for defining and running multi-container Docker applications. Instead of issuing complex docker run commands for each container, you describe your entire application stack, including its services, networks, and volumes, in a single declarative YAML file, typically named docker-compose.yml.
Think of a scenario where your training script needs to:
Attempting to orchestrate this with individual docker run commands would involve manually setting up networks for communication, managing container startup order, and handling volume mounts consistently across containers. Docker Compose simplifies this significantly.
With Compose, you define each component (the training script container, the database container, the configuration service container) as a separate service within the docker-compose.yml file. Compose handles the underlying Docker operations, such as:
docker-compose up start all defined services in the correct dependency order (if specified), and docker-compose down stops and removes them cleanly.Consider a simplified training setup involving just the training container and a PostgreSQL database for logging results. A docker-compose.yml file might look something like this:
version: '3.8' # Specifies the Compose file version
services:
# Service for the training script
training:
build: . # Instructs Compose to build an image from the Dockerfile in the current directory
volumes:
- ./data:/app/data # Mount local data directory into the container
- model_output:/app/output # Mount a named volume for model output
environment:
- DB_HOST=db # Use the service name 'db' as the hostname
- DB_NAME=training_logs
- DB_USER=trainer
- DB_PASSWORD=secretpass
depends_on: # Ensures the database starts before the training service
- db
# command: python train.py --config /app/config.yaml # Optional command override
# Service for the PostgreSQL database
db:
image: postgres:14-alpine # Use a pre-built PostgreSQL image
volumes:
- db_data:/var/lib/postgresql/data # Mount a named volume for persistent DB data
environment:
- POSTGRES_DB=training_logs
- POSTGRES_USER=trainer
- POSTGRES_PASSWORD=secretpass
# Define named volumes for persistent storage
volumes:
db_data:
model_output:
In this example:
services: training and db.training service builds its image using a local Dockerfile.db service uses a public postgres image.volumes are used for input data (./data bind mount), output models (model_output named volume), and persistent database storage (db_data named volume).environment variables configure both services, notably providing the database connection details to the training service. The DB_HOST is set to db, the service name of the PostgreSQL container, which Compose makes resolvable on the internal network.depends_on ensures the db service is started before the training service attempts to connect.Diagram illustrating services defined in a
docker-compose.ymlfile. Services liketraininganddbcommunicate over a shared network, managed by Compose. Volumes (model_output,db_data) provide persistent storage. An optional configuration service is also shown.
With the docker-compose.yml file defined, starting the entire stack is as simple as running:
docker-compose up --build
The --build flag tells Compose to build the image for the training service before starting the containers. If the image already exists and the Dockerfile or its context hasn't changed, Compose will reuse the existing image unless --build is specified.
To stop and remove the containers, network, and potentially the volumes defined in the Compose file, you use:
docker-compose down
Using Docker Compose significantly simplifies the management of multi-container setups common in more developed ML training pipelines. It provides a standardized way to define and share development or testing environments that closely mirror aspects of production deployments, enhancing reproducibility. While powerful orchestration systems like Kubernetes are often used for large-scale distributed training, Docker Compose is an invaluable tool for local development, testing, and managing moderately complex training stacks. The following sections will detail specific techniques for containerizing training scripts and handling associated aspects like configuration and GPU usage.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with