docker run
docker-compose.yml
As we discussed, containers provide isolated environments, but this isolation presents a challenge when you need to access or modify files that live outside the container, particularly during the active development phase of your Machine Learning project. Rebuilding a Docker image every time you tweak a line of Python code or update a configuration file is inefficient and significantly slows down the iteration cycle.
This is where bind mounts come into play. A bind mount establishes a direct link, or mapping, between a directory or file on your host machine (your computer's filesystem) and a directory or file inside the container. Unlike the COPY
instruction in a Dockerfile, which creates a static snapshot of your files at build time, a bind mount provides a live, dynamic connection. Changes made to the mounted directory or file on the host system are immediately reflected inside the container, and conversely (depending on permissions), changes made inside the container's mounted path can affect the host system's files.
Think of it like creating a shared folder between your host machine and the container. This direct access is extremely useful during development for several reasons:
You create bind mounts when you start a container using the docker run
command. There are two primary flags for this:
The -v
or --volume
flag (simpler syntax):
docker run -v /path/on/host:/path/in/container image_name [command]
This syntax maps /path/on/host
from your computer directly to /path/in/container
inside the container.
The --mount
flag (more explicit syntax):
docker run --mount type=bind,source=/path/on/host,target=/path/in/container image_name [command]
This syntax is often preferred for its clarity. It explicitly states the type
is bind
, the source
is the host path, and the target
is the container path. You can add other options like readonly
if needed.
Let's look at a common ML development scenario. Suppose you have a project structure on your host machine like this:
/home/user/my_ml_project/
├── src/
│ └── train.py
├── data/
│ └── features.csv
└── Dockerfile
You've built an image named my-ml-dev-env
using the Dockerfile
which contains Python, scikit-learn, and pandas. Now, you want to run the train.py
script inside the container, using the features.csv
data, while still being able to edit train.py
on your host.
You can achieve this using a bind mount:
# Using the --mount syntax
docker run --rm -it \
--mount type=bind,source=/home/user/my_ml_project/src,target=/app/src \
--mount type=bind,source=/home/user/my_ml_project/data,target=/app/data \
my-ml-dev-env \
python /app/src/train.py --data_path /app/data/features.csv
Or using the -v
shorthand:
# Using the -v syntax
docker run --rm -it \
-v /home/user/my_ml_project/src:/app/src \
-v /home/user/my_ml_project/data:/app/data \
my-ml-dev-env \
python /app/src/train.py --data_path /app/data/features.csv
In both examples:
src
directory to /app/src
inside the container.data
directory to /app/data
inside the container.train.py
script located within the container's mapped path /app/src
./app/data
.Now, if you open /home/user/my_ml_project/src/train.py
on your host machine and make changes, those changes are immediately effective the next time you run the docker run
command (or if you were running an interactive shell or server within the container, the changes would be reflected immediately upon accessing the file).
A diagram illustrating how directories on the host machine (
my_ml_project/src
,my_ml_project/data
) are mapped directly into the container's filesystem (/app/src
,/app/data
) using bind mounts.
While powerful for development, keep these points in mind:
docker run
command. This makes images less portable if they rely heavily on specific host paths being mounted.C:\Users\...
) versus Linux/macOS (/home/user/...
). Be mindful of this when sharing docker run
commands or scripts across different development environments.Bind mounts are an indispensable tool for streamlining the inner loop of ML development: editing code, running experiments, and accessing local resources within the consistent environment provided by your Docker container. They bridge the gap between your host machine and the containerized environment precisely when you need that direct, real-time connection. However, for managing persistent data more robustly, especially in non-development scenarios or when sharing data between containers, Docker offers another mechanism: Volumes, which we will explore next.
© 2025 ApX Machine Learning