docker run
docker-compose.yml
Once you have packaged your training environment into a Docker image, the next challenge is running specific training jobs without modifying the image for every experiment. Hardcoding values like learning rates, dataset paths, or the number of training epochs directly into your script makes reuse difficult and hampers reproducibility. The goal is to treat the container image as an immutable artifact and supply the configuration externally for each specific training run.
There are two primary mechanisms for passing configuration parameters into a Docker container when you start it: environment variables and command-line arguments. Let's examine how to use these effectively for ML training workflows.
Environment variables are key-value pairs that exist within the container's operating environment. They are a standard way to provide configuration details to applications running inside containers.
Defining Defaults in the Dockerfile
You can set default values for environment variables directly in your Dockerfile using the ENV
instruction. This is useful for setting sensible defaults or defining paths that are standard within the container's structure.
# Base image (e.g., Python or a framework-specific image)
FROM python:3.9-slim
# Set a default path for training data within the container
ENV TRAINING_DATA_PATH=/data/train.csv
ENV MODEL_OUTPUT_DIR=/output
# Copy application code
WORKDIR /app
COPY . .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Default command (assuming train.py reads environment variables)
CMD ["python", "train.py"]
Overriding Defaults at Runtime
The real flexibility comes from overriding these defaults or providing new environment variables when you launch the container using the -e
or --env
flag with docker run
.
# Run training using the default data path
docker run my_training_image
# Run training, specifying a different data path
docker run -e TRAINING_DATA_PATH=/data/augmented_train.csv my_training_image
# Specify multiple variables, including one not defined in the Dockerfile
docker run \
-e TRAINING_DATA_PATH=/data/subset_train.csv \
-e MODEL_OUTPUT_DIR=/models/run123 \
-e LEARNING_RATE=0.001 \
my_training_image
Accessing Environment Variables in Python
Inside your Python training script, you can access these environment variables using the os
module. It's good practice to retrieve these values at the beginning of your script.
import os
import argparse
# --- Configuration ---
# Get paths from environment variables, providing defaults if not set
data_path = os.getenv('TRAINING_DATA_PATH', '/data/train.csv') # Default fallback
model_dir = os.getenv('MODEL_OUTPUT_DIR', '/output')
# Get hyperparameters from environment variables (can also use argparse)
learning_rate = float(os.getenv('LEARNING_RATE', 0.01)) # Default fallback
epochs = int(os.getenv('EPOCHS', 10)) # Default fallback
print(f"--- Training Configuration ---")
print(f"Data Path: {data_path}")
print(f"Model Output Directory: {model_dir}")
print(f"Learning Rate: {learning_rate}")
print(f"Epochs: {epochs}")
print(f"-----------------------------")
# --- Placeholder for actual training logic ---
# Load data from data_path
# Train model using learning_rate, epochs
# Save model to model_dir
print("Simulating model training...")
# --- End Placeholder ---
print("Training finished.")
Environment variables are well-suited for configuration parameters like file paths, API keys (handle secrets carefully, perhaps using Docker secrets or managed services in production), or simple flags and settings.
Another common way to configure applications is through command-line arguments passed when the program starts. This method is particularly effective for parameters that define a specific experiment, like hyperparameters.
Designing Scripts to Accept Arguments
Your training script needs to be designed to parse arguments passed to it. Python's built-in argparse
module is excellent for this.
import os
import argparse
# --- Argument Parsing ---
parser = argparse.ArgumentParser(description='ML Model Training Script')
# Define command-line arguments
parser.add_argument('--data-path', type=str, default=os.getenv('TRAINING_DATA_PATH', '/data/train.csv'),
help='Path to the training data file')
parser.add_argument('--model-dir', type=str, default=os.getenv('MODEL_OUTPUT_DIR', '/output'),
help='Directory to save the trained model')
parser.add_argument('--learning-rate', type=float, default=float(os.getenv('LEARNING_RATE', 0.01)),
help='Learning rate for the optimizer')
parser.add_argument('--epochs', type=int, default=int(os.getenv('EPOCHS', 10)),
help='Number of training epochs')
# Parse the arguments
args = parser.parse_args()
print(f"--- Training Configuration (via ArgParse) ---")
print(f"Data Path: {args.data_path}")
print(f"Model Output Directory: {args.model_dir}")
print(f"Learning Rate: {args.learning_rate}")
print(f"Epochs: {args.epochs}")
print(f"-------------------------------------------")
# --- Placeholder for actual training logic ---
# Load data from args.data_path
# Train model using args.learning_rate, args.epochs
# Save model to args.model_dir
print("Simulating model training...")
# --- End Placeholder ---
print("Training finished.")
Notice how we can combine methods: argparse
can define default values fetched from environment variables, providing flexibility.
Passing Arguments via docker run
To pass command-line arguments to the script inside the container, you simply append them to your docker run
command after the image name. These arguments are passed to the container's ENTRYPOINT
or CMD
.
Assuming your Dockerfile has an ENTRYPOINT
like:
# ... (rest of Dockerfile)
ENTRYPOINT ["python", "train.py"]
Or a CMD
(though ENTRYPOINT
is often preferred for scripts):
# ... (rest of Dockerfile)
CMD ["python", "train.py"]
You would run it like this:
# Run training with specific hyperparameters
docker run my_training_image \
--learning-rate 0.005 \
--epochs 25 \
--data-path /data/processed_features.pkl \
--model-dir /output/experiment_abc
These arguments (--learning-rate 0.005
, etc.) are appended to the ENTRYPOINT
(or override the CMD
) and parsed by argparse
within your train.py
script.
Both environment variables and command-line arguments are valid ways to pass configuration. Here’s a general guideline:
ENV
, -e
): Better suited for environment-specific settings (like paths that might differ between dev, staging, prod), secrets (though dedicated secret management is better), or configurations that are less likely to change with every single training run. They define the context in which the script runs.docker run ... <args>
): Ideal for parameters specific to a particular execution or experiment, such as hyperparameters (learning_rate
, batch_size
), run identifiers, or flags controlling script behavior (--evaluate-only
). They define the task the script should perform.Using command-line arguments often makes the purpose of a specific docker run
command clearer, as the experimental parameters are explicitly listed.
For intricate configurations involving nested parameters or long lists, passing everything via environment variables or command-line arguments can become cumbersome. A common pattern is to use a configuration file (e.g., config.yaml
or params.json
).
# Example: Mount config.yaml and tell the script where to find it
docker run -v $(pwd)/config.yaml:/app/config.yaml \
my_training_image --config /app/config.yaml
Your Python script would then use libraries like PyYAML
or json
to load /app/config.yaml
.
By externalizing configuration using environment variables, command-line arguments, or mounted configuration files, you create flexible and reusable training containers. This separation allows you to run numerous experiments with different parameters using the exact same Docker image, significantly improving the reproducibility and manageability of your ML training workflows.
© 2025 ApX Machine Learning