Machine Learning workflows heavily rely on data. You need datasets to train your models, you might generate intermediate checkpoints during long training runs, and ultimately, you need to save the trained model artifacts. When working with containers, understanding how data is handled is fundamental, because the default behavior isn't suited for persistence.Let's look at what happens when a container runs. Docker creates a writable layer on top of the read-only image layers. Think of the image as a blueprint and the container as a running instance built from that blueprint. Any changes the running application makes, such as creating files, modifying existing ones, or downloading data, are written to this specific container's writable layer.The significant point here is that this writable layer is ephemeral. It is tightly coupled to the life of that single container instance. When the container is stopped and removed (which happens frequently, for example, when updating an application or simply cleaning up resources), this writable layer, along with all the data it contains, is permanently deleted. Imagine downloading a large dataset or completing hours of model training, only to have the results vanish when the container is removed. This clearly won't work for most ML tasks.We need ways to store data persistently, independent of the container's lifecycle. We need mechanisms that allow:Data Persistence: Ensuring datasets, logs, model checkpoints, and final models survive container restarts or removals.Data Sharing: Making data from the host machine available inside the container (like source code or local datasets during development).Data Decoupling: Separating the application logic (in the container image) from the data it operates on.Docker provides two primary mechanisms to achieve this, which form the core of managing data in containerized ML applications:Bind Mounts: These directly map a file or directory from your host machine's filesystem into the container. Changes made in the container are reflected on the host, and vice-versa.Volumes: These are storage areas managed by Docker itself. They are stored within a specific part of the host filesystem managed by Docker, but their exact location isn't something you typically interact with directly. Volumes are the preferred method for persisting container-generated data.The following diagram illustrates the relationship between the container's ephemeral layer and these persistent storage options.digraph G { rankdir=LR; node [shape=box, style=filled, fontname="Helvetica"]; subgraph cluster_host { label="Host Machine Filesystem"; style=filled; color="#e9ecef"; host_dir [label="Host Directory\n(/path/to/data)", fillcolor="#a5d8ff"]; docker_managed [label="Docker Managed Area\n(e.g., /var/lib/docker/volumes)", fillcolor="#96f2d7", shape=folder]; } subgraph cluster_container { label="Container"; style=filled; color="#dee2e6"; container_fs [label="Container Writable Layer\n(Ephemeral - Lost on Removal)", fillcolor="#ffc9c9"]; container_mount_bind [label="Mount Point\n(/data_bind)", fillcolor="#a5d8ff"]; container_mount_volume [label="Mount Point\n(/data_volume)", fillcolor="#96f2d7"]; } container_fs -> container_mount_bind [style=invis]; // ensure layout container_fs -> container_mount_volume [style=invis]; // ensure layout edge [fontname="Helvetica", fontsize=10]; host_dir -> container_mount_bind [label="Bind Mount\n(Direct Link)"]; docker_managed -> container_mount_volume [label="Volume\n(Docker Managed)"]; {rank=same; host_dir; docker_managed;} {rank=same; container_fs; container_mount_bind; container_mount_volume;} }Data within the Container Writable Layer is lost when the container is removed. Bind mounts link directly to the host filesystem, while volumes provide Docker-managed persistent storage.Understanding this distinction between the container's internal, ephemeral storage and externally mounted persistent storage is the first step towards effectively managing datasets, models, and other artifacts in your containerized ML projects. In the following sections, we will examine bind mounts and volumes in detail, discussing their use cases, advantages, and disadvantages for different ML scenarios.