docker run
docker-compose.yml
As we discussed earlier in the course (Chapter 3), containers are ephemeral by default. Any data written inside a container's filesystem is lost when the container is removed. For multi-container applications, managing persistent data like datasets, trained models, logs, or database files becomes essential. While docker run
allows mounting volumes using the -v
flag, Docker Compose provides a more structured and manageable way to define and attach volumes to your services.
In Docker Compose, you manage volumes in two main steps:
docker-compose.yml
file under the volumes:
key. This tells Compose about the volumes your application stack requires.volumes:
key specific to that service.Named volumes are the preferred mechanism for persisting data generated by and used by Docker containers. Docker manages the storage area on the host machine, and you only need to refer to the volume by its name.
To declare a named volume, add a top-level volumes:
section to your docker-compose.yml
:
version: '3.8' # Or a later version
services:
# ... your service definitions go here ...
volumes:
postgres_data: # Declares a named volume called 'postgres_data'
ml_models: # Declares another named volume called 'ml_models'
In this example, we've declared two named volumes: postgres_data
and ml_models
. Compose will create these volumes automatically the first time you run docker-compose up
if they don't already exist on your Docker host. You can also specify driver options here if needed, but the default local driver is usually sufficient.
Once a volume is declared, you can mount it into one or more services. Under the specific service definition, use the volumes:
key (note: this is different from the top-level key). The syntax is typically volume-name:/path/in/container
.
Let's extend the previous example. Imagine you have a PostgreSQL database service for storing experiment metadata and an inference API service that needs access to trained models.
version: '3.8'
services:
db:
image: postgres:14-alpine
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: password
POSTGRES_DB: ml_metadata
volumes:
- postgres_data:/var/lib/postgresql/data # Mount 'postgres_data' volume
inference_api:
build: ./inference_service # Assumes a Dockerfile here
ports:
- "8000:8000"
volumes:
- ml_models:/app/models # Mount 'ml_models' volume
depends_on:
- db
volumes:
postgres_data:
ml_models:
Here's what's happening:
db
service: We mount the postgres_data
volume (declared at the top level) to the standard PostgreSQL data directory /var/lib/postgresql/data
inside the container. Now, all database files created by PostgreSQL will persist in this Docker-managed volume, surviving container restarts or removals.inference_api
service: We mount the ml_models
volume to /app/models
inside the container. If a training service (perhaps run separately or defined in the same Compose file) saves models into this volume, the inference_api
service can load them from /app/models
.This setup ensures data persistence and allows data sharing between containers if needed (though in this specific example, each volume is used by one service).
Diagram showing two services defined in Docker Compose, each mounting a distinct named volume managed by Docker.
Managing volumes through Docker Compose offers several benefits:
docker-compose.yml
), making the configuration easy to understand and manage.docker-compose up
).docker-compose down -v
. This simplifies cleanup.In ML workflows managed with Compose, volumes are frequently used for:
By defining and mounting volumes within your docker-compose.yml
file, you gain a robust and manageable way to handle persistent data for your multi-container ML applications, ensuring that important information like models, datasets, and configurations survives beyond the lifespan of individual containers. This approach simplifies setup, enhances reproducibility, and aligns well with containerization best practices.
© 2025 ApX Machine Learning