docker run
docker-compose.yml
Once you've defined how to build your ML environment using a Dockerfile and actually built an image on your local machine, the next step is making that image available elsewhere or obtaining pre-built images created by others. This is where Docker registries and repositories come into play. Think of a registry as a centralized storage and distribution system for Docker images, much like PyPI serves Python packages or GitHub hosts code repositories.
A Docker registry is a service that stores and distributes Docker images. It can be public or private. The most well-known public registry is Docker Hub, which serves as the default registry for the Docker command-line tool.
Within a registry, images are organized into repositories. A repository holds different versions (tags) of a specific image. For example, Docker Hub hosts repositories for official Python images (python
), popular databases like PostgreSQL (postgres
), and, importantly for ML, core frameworks like TensorFlow (tensorflow/tensorflow
) and PyTorch (pytorch/pytorch
).
The standard naming convention for a Docker image follows this pattern:
[registry_host/][username_or_organization/]repository_name[:tag]
registry_host
: Specifies the registry if it's not Docker Hub (e.g., gcr.io
for Google Container Registry, nvcr.io
for NVIDIA GPU Cloud). If omitted, Docker assumes Docker Hub.username_or_organization
: Identifies the owner of the repository on the registry (e.g., tensorflow
in tensorflow/tensorflow
). For official images on Docker Hub (like python
or ubuntu
), this part is often omitted.repository_name
: The actual name of the image set (e.g., tensorflow
, python
).tag
: A specific version or variant identifier (e.g., latest
, 3.9-slim
, 2.11.0-gpu
). If omitted, Docker defaults to the latest
tag.For instance:
python:3.9
: The official Python image, version 3.9 tag, from Docker Hub.tensorflow/tensorflow:2.11.0-gpu
: The TensorFlow image, version 2.11.0 with GPU support, from the tensorflow
organization on Docker Hub.nvcr.io/nvidia/pytorch:23.10-py3
: The PyTorch image, tag 23.10-py3
, from the nvidia
organization on the NVIDIA GPU Cloud registry (nvcr.io
).Docker Hub is the default and largest public registry. It hosts a vast collection of images, including:
ubuntu
, debian
, alpine
, and python
, maintained by Docker or upstream projects. These are often starting points for custom ML environments.tensorflow/tensorflow
, pytorch/pytorch
, providing ready-to-use environments with specific framework versions and sometimes GPU support.Using images from Docker Hub is fundamental for ML practitioners. Instead of building everything from scratch, you can leverage base images that already include Python, specific ML libraries, or even CUDA drivers for GPU acceleration.
Tags are essential for reproducibility. The latest
tag is convenient but often points to the most recently pushed version, which can change unexpectedly. In ML, consistency is critical. Relying on latest
can lead to builds failing or models behaving differently if the underlying image changes.
Best practice dictates using specific tags that denote versions or specific configurations:
python:3.9.18-slim-bookworm
: Specifies Python version 3.9.18, the slim
variant (smaller size), based on Debian bookworm
.tensorflow/tensorflow:2.12.0-gpu
: Specifies TensorFlow version 2.12.0 with GPU support.pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
: Specifies PyTorch 2.0.1, built for CUDA 11.7 and cuDNN 8, containing only the runtime components.Always use explicit tags in your Dockerfiles (FROM python:3.9-slim
) and when running containers to ensure you are using the exact environment you intend.
docker pull
CommandTo download an image from a registry to your local machine, you use the docker pull
command. Docker automatically checks your local storage first; if the image (with the specific tag) isn't present, it contacts the appropriate registry (Docker Hub by default) and downloads it.
# Pull the official Python 3.9 slim image from Docker Hub
docker pull python:3.9-slim
# Pull a specific TensorFlow GPU image from Docker Hub
docker pull tensorflow/tensorflow:2.11.0-gpu
# Pull an image from a different registry (NVIDIA)
docker pull nvcr.io/nvidia/cuda:12.1.1-base-ubuntu22.04
If you run docker run
with an image name that isn't available locally, Docker will implicitly perform a docker pull
first.
docker push
CommandAfter building a custom image tailored for your ML project (e.g., containing your specific dependencies or even a trained model), you might want to share it with colleagues or deploy it to a server. This is done using the docker push
command.
Before you can push, you need to:
docker tag
to rename or alias an existing local image.
# Assume you built an image locally named 'my-sklearn-app:1.0'
# Tag it for pushing to Docker Hub under username 'myusername'
docker tag my-sklearn-app:1.0 myusername/my-sklearn-app:1.0
docker login
command. For Docker Hub, it will prompt for your username and password (or an access token).
docker login
# Enter username and password/token when prompted
For other registries, provide the registry host: docker login myregistry.example.com
.Once tagged and authenticated, you can push the image:
# Push the tagged image to Docker Hub
docker push myusername/my-sklearn-app:1.0
The image layers will be uploaded to the registry, making it available for others (or your deployment systems) to pull.
Workflow for sharing a custom Docker image using a registry.
While Docker Hub is excellent for public and open-source projects, you'll often need a private registry for proprietary code, sensitive data handling configurations, or internal company use. Major cloud providers offer managed registry services:
You can also host your own registry using software like Harbor or the official Docker Registry image. Access control and security are the primary reasons for using private registries in commercial or research settings. Authentication (docker login
) works similarly, but you'll use credentials specific to the private registry provider.
Understanding registries and repositories is fundamental to leveraging Docker effectively in ML. They enable sharing standardized environments, accessing pre-built tools, and forming the backbone of container-based deployment strategies, ensuring that the environment built locally is the same one that runs elsewhere.
© 2025 ApX Machine Learning