Incorporating project code and related artifacts into a Docker image ensures the container has everything it needs to execute Machine Learning tasks, such as training models or running inference services. This step typically follows the definition of the base environment and installation of dependencies within the Dockerfile. The main instructions for this are COPY and ADD.
The COPY instruction is the most straightforward and commonly used method for getting files from your local machine (specifically, the build context) into the image filesystem. Its syntax is simple:
COPY <src>... <dest>
<src>: Specifies the file or directory on your local machine (relative to the build context root) that you want to copy. You can specify multiple sources. Wildcards are supported.<dest>: Specifies the path inside the container image where the source files/directories should be copied. If the destination doesn't exist, Docker will create it. If the source is a directory, the destination must end with a / or be an existing directory.For a typical ML project structured with a src directory containing Python code and a requirements.txt file in the root, you might use COPY like this:
# Define the working directory (covered previously)
WORKDIR /app
# Copy the requirements file first to leverage caching
COPY requirements.txt .
# Install dependencies (covered previously)
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the project code
COPY ./src ./src
In this example:
/app.requirements.txt from the build context root to the /app directory inside the image.pip install.src directory into a directory named src within the /app directory inside the image (i.e., /app/src).This order is significant for build performance, as we'll discuss shortly regarding build caching.
Docker also provides the ADD instruction, which has similar syntax to COPY:
ADD <src>... <dest>
ADD performs the same function as COPY for local files and directories but includes two additional features:
<src> is a URL, Docker will download the file from the URL and copy it to <dest>. Permissions are set to 600.<src> is a recognized compressed archive format (like tar, gzip, bzip2, xz), Docker will automatically unpack it into <dest> as a directory.While these features might seem convenient, they can also make your builds less predictable. For instance, downloading from URLs during the build couples your image build process to network availability and the stability of the remote source. Auto-extraction can sometimes have unexpected results depending on the archive structure.
Recommendation: For clarity and predictability, prefer COPY over ADD for transferring local files and directories. Use ADD only if you specifically need its URL download or auto-extraction capabilities, and understand the potential implications. For downloading files, often it's better practice to use tools like curl or wget within a RUN instruction, providing more control over the download process (e.g., error handling, retries).
When you run docker build, the Docker client first bundles the directory specified as the build context (usually .) and sends it to the Docker daemon. This context includes all files and subdirectories. Sending large, unnecessary files (like datasets, virtual environments, Git history, IDE configuration) slows down the build process and can bloat your image if accidentally copied.
To prevent this, create a file named .dockerignore in the root of your build context (the same directory as your Dockerfile). List files or directories you want to exclude, using syntax similar to .gitignore.
Here's an example .dockerignore for a typical ML project:
# Git files
.git
.gitignore
# Python virtual environment
venv/
*.pyc
__pycache__/
# IDE / Editor specific files
.vscode/
.idea/
*.swp
# Large data files (manage via volumes/mounts later)
data/
datasets/
# Model artifacts (if large or managed separately)
models/
*.pt
*.h5
*.onnx
*.pkl
# Docker files
Dockerfile
.dockerignore
# Other temporary or local files
*.log
notebooks/output/
By using .dockerignore, you ensure that only the essential code and configuration files are sent to the daemon and are available to be copied into your image via COPY or ADD. This leads to faster builds and smaller, more secure images.
Docker builds images in layers. Each instruction in the Dockerfile (like RUN, COPY, ADD) creates a new layer. Docker utilizes a build cache: if the files related to a COPY instruction haven't changed since the last build, and the preceding layers are also cached, Docker reuses the existing layer instead of executing the instruction again.
This is particularly important for time-consuming steps like dependency installation. Consider these two approaches:
Less Optimal:
WORKDIR /app
COPY . . # Copy everything at once
RUN pip install -r requirements.txt
# ... rest of Dockerfile
If any file in your project changes (even a minor code edit), the COPY . . layer becomes invalid, and Docker must re-run the potentially lengthy pip install command, even if requirements.txt didn't change.
More Optimal:
WORKDIR /app
# 1. Copy only the requirements file
COPY requirements.txt .
# 2. Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# 3. Copy the rest of the application code
COPY . .
# ... rest of Dockerfile
In this improved version:
.py files) changes, Docker reuses the cached layers up to and including the RUN pip install step (because requirements.txt hasn't changed). It only needs to re-execute the final COPY . . instruction, which is very fast.requirements.txt itself is modified.Structuring your COPY instructions thoughtfully, separating dependency definitions from application code, significantly speeds up iterative development cycles.
Should you COPY pre-trained models, datasets, or other large artifacts directly into your image?
Generally, for datasets and larger or frequently updated model artifacts, copying them directly into the image is not the recommended approach. Alternative strategies using Docker volumes or bind mounts provide more flexibility and efficiency. These techniques allow you to decouple the data and artifacts from the image itself, making updates easier and keeping images smaller. We will explore these data management techniques in detail in Chapter 3, "Managing Data and Models in Containers".
For now, understand that COPY and ADD are essential for getting your application code and configuration into the image. Use .dockerignore to exclude unnecessary files, and structure your COPY operations to maximize the benefits of Docker's build cache.
Was this section helpful?
COPY and ADD instructions, their syntax, and operational differences.© 2026 ApX Machine LearningEngineered with