Now that we understand the goal of containerization using Docker – packaging our application and its environment together – how do we actually tell Docker what to package? This is where the Dockerfile comes in.

Think of a Dockerfile as a recipe or a set of instructions. It's a simple text file, typically named Dockerfile (with a capital 'D' and no file extension), that contains a sequence of commands Docker uses to build a specific image. Each instruction in the file creates a new layer in the image, building upon the previous one.

Let's look at some of the fundamental instructions you'll commonly use, especially when packaging a Python web application like our Flask prediction service.

Common Dockerfile Instructions

FROM: Every Dockerfile must start with a FROM instruction. It specifies the base image upon which you are building. A base image provides a starting operating system and environment. For a Python application, you'll typically start from an official Python base image, which already includes a Linux distribution and a specific version of Python installed.
- Example: FROM python:3.9-slim specifies using a lightweight ("slim") version of the official image containing Python 3.9.
WORKDIR: This instruction sets the working directory for any subsequent RUN, CMD, ENTRYPOINT, COPY, and ADD instructions in the Dockerfile. If the directory doesn't exist, Docker will create it. It helps organize files within the container's filesystem.
- Example: WORKDIR /app sets the current directory inside the container to /app.
COPY: This instruction copies files or directories from your local machine (the build context) into the filesystem of the container image. You need this to get your application code, model files, and dependency lists into the image.
- Example: COPY requirements.txt . copies the requirements.txt file from your project directory into the current working directory (/app if set by WORKDIR) inside the image. The . refers to the current WORKDIR.
- Example: COPY . . copies everything from your project directory (the build context) into the current working directory inside the image. Be mindful of what you copy; often, it's better to copy specific files or use a .dockerignore file (similar to .gitignore) to exclude unnecessary files like virtual environments or local configuration.
RUN: This instruction executes commands during the image build process. It's commonly used to install software packages, including Python libraries. Each RUN instruction creates a new image layer.
- Example: RUN pip install --no-cache-dir -r requirements.txt uses pip (which is available because we started FROM a Python image) to install the packages listed in requirements.txt. The --no-cache-dir flag is often recommended in Dockerfiles to reduce image size by preventing pip from storing the download cache.
EXPOSE: This instruction informs Docker that the container will listen on the specified network ports at runtime. It acts primarily as documentation between the person who builds the image and the person who runs the container. It does not actually publish the port or make it accessible from the host machine; that's done when you run the container using the -p or -P flag with the docker run command.
- Example: EXPOSE 5000 indicates that the application inside the container is expected to listen on port 5000 (the default port for Flask development server).
CMD: This instruction specifies the default command to execute when a container is started from the image. There can only be one CMD instruction in a Dockerfile. If you list more than one CMD, only the last one will take effect. The command specified by CMD can be overridden when you start the container.
- Example: CMD ["python", "app.py"] sets the default command to run the Flask application script app.py using the Python interpreter. This is the preferred "exec" form of CMD.

Preparing Dependencies: `requirements.txt`

Before writing the Dockerfile, it's essential to list all the Python libraries your application depends on. The standard way to do this is by creating a file named requirements.txt in your project's root directory. For our Flask prediction service, this file might look something like this:

# requirements.txt
flask
scikit-learn
joblib
numpy
# Add any other libraries your specific model or preprocessing requires

This file allows the RUN pip install -r requirements.txt command in the Dockerfile to install the exact dependencies needed within the container image, ensuring consistency.

Example `Dockerfile` for the Flask Service

Let's put these instructions together into a simple Dockerfile suitable for the Flask prediction service we developed in the previous chapter. Assume your project directory contains your Flask app script (e.g., app.py), your saved model file (e.g., model.joblib), any saved preprocessors, and the requirements.txt file.

# Dockerfile

# Start from a lightweight Python base image
FROM python:3.9-slim

# Set the working directory inside the container
WORKDIR /app

# Copy the requirements file first to leverage Docker cache
COPY requirements.txt .

# Install Python dependencies
# Using --no-cache-dir reduces image size
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the working directory
COPY . .

# Inform Docker that the container listens on port 5000
EXPOSE 5000

# Define the command to run the application when the container starts
CMD ["python", "app.py"]

Explanation:

FROM python:3.9-slim: We start with a Python 3.9 environment.
WORKDIR /app: We set the default directory inside the container to /app.
COPY requirements.txt .: We copy only the requirements file first. Docker builds images in layers and caches them. If requirements.txt hasn't changed, Docker can reuse the layer created by the next RUN instruction, speeding up subsequent builds if only the application code changes.
RUN pip install ...: We install the dependencies listed in requirements.txt.
COPY . .: Now, we copy the rest of our project files (like app.py and model.joblib) into the /app directory inside the image.
EXPOSE 5000: We document that our Flask app will run on port 5000.
CMD ["python", "app.py"]: We specify that running python app.py should be the default action when a container starts from this image.

This simple text file provides Docker with all the necessary steps to create a self-contained image holding our Python environment, dependencies, and application code. The next step is to use this Dockerfile to actually build the Docker image.

Writing a Simple Dockerfile

Common Dockerfile Instructions

Preparing Dependencies: requirements.txt

Example Dockerfile for the Flask Service

Preparing Dependencies: `requirements.txt`

Example `Dockerfile` for the Flask Service