docker run
docker-compose.yml
Let's put theory into practice by building and containerizing a simple machine learning inference API. We'll take a pre-trained model, wrap it in a lightweight web application using Flask, and then create a Dockerfile to package everything into an efficient, runnable container.
Prerequisites:
model.pkl
. You can train a simple model (e.g., Logistic Regression on the Iris dataset) and save it using joblib
or pickle
.Project Structure:
Organize your project files as follows:
.
├── app.py # Flask application code
├── model.pkl # Your pre-trained model file
├── requirements.txt # Python dependencies
└── Dockerfile # Instructions to build the Docker image
1. Create the Inference API (app.py
)
This Python script will use Flask to create a web server. It loads our pre-trained model and exposes an endpoint (e.g., /predict
) that accepts data, uses the model to make predictions, and returns the results.
# app.py
import joblib
import numpy as np
from flask import Flask, request, jsonify
import os
# Initialize Flask app
app = Flask(__name__)
# Load the pre-trained model
# Ensure model.pkl is in the same directory or provide the correct path
model_path = os.path.join(os.path.dirname(__file__), 'model.pkl')
try:
model = joblib.load(model_path)
print("Model loaded successfully.")
except FileNotFoundError:
print(f"Error: Model file not found at {model_path}")
model = None
except Exception as e:
print(f"Error loading model: {e}")
model = None
@app.route('/predict', methods=['POST'])
def predict():
if model is None:
return jsonify({'error': 'Model not loaded'}), 500
try:
# Get data from POST request
data = request.get_json(force=True)
# Ensure data is in the expected format (e.g., list of features)
# Adapt this part based on your model's input requirements
if 'features' not in data:
return jsonify({'error': 'Missing "features" key in JSON payload'}), 400
features = np.array(data['features'])
# Perform prediction
# Reshape if your model expects a 2D array (e.g., for a single sample)
if features.ndim == 1:
features = features.reshape(1, -1)
prediction = model.predict(features)
prediction_proba = None
if hasattr(model, "predict_proba"):
# Get probabilities if the model supports it
prediction_proba = model.predict_proba(features).tolist()
# Return prediction as JSON response
response = {
'prediction': prediction.tolist(),
}
if prediction_proba:
response['probabilities'] = prediction_proba
return jsonify(response)
except Exception as e:
print(f"Prediction error: {e}")
return jsonify({'error': f'An error occurred during prediction: {str(e)}'}), 500
@app.route('/health', methods=['GET'])
def health_check():
# Simple health check endpoint
# You could add checks here (e.g., model loaded)
if model is not None:
return jsonify({'status': 'ok'}), 200
else:
return jsonify({'status': 'error', 'reason': 'Model not loaded'}), 500
if __name__ == '__main__':
# Run the app using Flask's built-in server for development
# For production, use a WSGI server like Gunicorn (configured in Dockerfile)
app.run(host='0.0.0.0', port=5000, debug=False) # Set debug=False for production simulation
This script defines two routes:
/predict
: Accepts POST requests with JSON data containing a features
key. It uses the loaded model.pkl
to predict and returns the result./health
: A simple GET endpoint to check if the service is running and the model loaded.2. Define Dependencies (requirements.txt
)
List the Python libraries needed for your API. For production, we include gunicorn
, a robust WSGI server.
# requirements.txt
flask
scikit-learn # Or the library used for your model (e.g., tensorflow, torch)
numpy
joblib # Or pickle if you used that for saving the model
gunicorn # WSGI server for production
3. Create the Dockerfile
This file contains the instructions Docker uses to build your image. We'll start with a basic version and then show an optimized multi-stage build.
Basic Dockerfile:
# Dockerfile (Basic)
# 1. Base Image: Start with an official Python runtime
FROM python:3.9-slim
# 2. Set Working Directory: Define the context for subsequent instructions
WORKDIR /app
# 3. Copy Dependencies File: Copy requirements first for layer caching
COPY requirements.txt .
# 4. Install Dependencies: Install Python packages
# --no-cache-dir: Reduces image size by not storing the pip cache
# --upgrade pip: Ensure pip is up-to-date
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
# 5. Copy Application Code and Model: Copy the rest of your application
COPY . .
# 6. Expose Port: Inform Docker the container listens on port 5000
EXPOSE 5000
# 7. Define Run Command: Specify command to run the application using Gunicorn
# --bind 0.0.0.0:5000: Listen on all network interfaces inside the container
# app:app: Refers to the Flask app object named 'app' within the 'app.py' file
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]
Explanation:
FROM python:3.9-slim
: Uses a lightweight official Python image.WORKDIR /app
: Sets the current directory inside the container to /app
.COPY requirements.txt .
: Copies only the requirements file first. Docker caches layers, so if requirements.txt
doesn't change, the pip install
layer won't need to be rebuilt, speeding up subsequent builds.RUN pip install...
: Installs the dependencies using pip
. --no-cache-dir
prevents pip from storing downloaded packages, saving space.COPY . .
: Copies the rest of the project files ( app.py
, model.pkl
) into the /app
directory in the container.EXPOSE 5000
: Documents that the application inside the container will listen on port 5000. This doesn't publish the port; it's metadata.CMD ["gunicorn", ...]
: Specifies the default command to run when the container starts. We use Gunicorn for a production-ready server, binding it to 0.0.0.0
so it's accessible from outside the container (when mapped) and specifying our Flask application object (app
inside app.py
).4. Build the Docker Image
Open your terminal in the project directory (where the Dockerfile
is) and run the build command:
docker build -t ml-inference-api:basic .
docker build
: The command to build an image.-t ml-inference-api:basic
: Tags the image with a name (ml-inference-api
) and a tag (basic
)..
: Specifies the build context (the current directory). Docker sends files from this directory to the Docker daemon.5. Run the Container
Now, run a container from the image you just built:
docker run -d -p 5001:5000 --name my_api ml-inference-api:basic
docker run
: The command to create and start a container.-d
: Runs the container in detached mode (in the background).-p 5001:5000
: Maps port 5001 on your host machine to port 5000 inside the container. This allows you to access the API via localhost:5001
.--name my_api
: Assigns a name to the running container for easier management.ml-inference-api:basic
: The image to run.You can check if the container is running using docker ps
.
6. Test the API
You can test the running API using curl
or a simple Python script. Assuming your model expects 4 features:
Using curl
from your terminal:
curl -X POST http://localhost:5001/predict \
-H "Content-Type: application/json" \
-d '{"features": [5.1, 3.5, 1.4, 0.2]}' # Example Iris features
# Expected output (will vary based on your model):
# {"prediction":[0]} or {"prediction":[0], "probabilities":[[0.98, 0.01, 0.01]]}
Check the health endpoint:
curl http://localhost:5001/health
# Expected output:
# {"status":"ok"}
7. Optimizing with Multi-Stage Builds
The basic image includes build tools and potentially large libraries not strictly needed for running the inference service. A multi-stage build creates smaller, more secure production images.
Optimized Dockerfile
(Multi-Stage):
# Dockerfile (Multi-Stage)
# ---- Build Stage ----
# Use a full Python image to install dependencies, including any build-time needs
FROM python:3.9 as builder
WORKDIR /build
# Install build essentials if needed (e.g., for C extensions)
# RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
# Install dependencies into a specific directory
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir --prefix="/install" -r requirements.txt
# ---- Runtime Stage ----
# Use a minimal base image for the final runtime environment
FROM python:3.9-slim
WORKDIR /app
# Copy only the installed packages from the build stage
COPY --from=builder /install /usr/local
# Copy application code and model
COPY app.py .
COPY model.pkl .
# Expose port
EXPOSE 5000
# Define run command (same as before)
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]
Explanation of Changes:
FROM python:3.9 as builder
: Defines the first stage named builder
. This stage can use a larger image if needed for compilation.pip install --prefix="/install"
: Installs packages into a specific directory (/install
) within the builder
stage, rather than the default system Python location.FROM python:3.9-slim
: Starts a new, clean, minimal base image for the final runtime stage.COPY --from=builder /install /usr/local
: This is the significant part. It copies only the contents of the /install
directory (where our dependencies were installed) from the builder
stage into the final image's standard library location (/usr/local
). Build tools and caches from the builder
stage are discarded.COPY app.py .
, COPY model.pkl .
: Copies the necessary application files.Build and Compare:
Build the optimized image:
docker build -t ml-inference-api:optimized .
Compare image sizes:
docker images ml-inference-api
You should see that ml-inference-api:optimized
is significantly smaller than ml-inference-api:basic
. Run it just like the basic version (adjusting port if needed):
# Stop and remove the previous container first if running
docker stop my_api
docker rm my_api
# Run the optimized version
docker run -d -p 5001:5000 --name my_api_optimized ml-inference-api:optimized
Test it again using curl
as before.
Summary:
In this hands-on exercise, you successfully:
requirements.txt
.Dockerfile
to package the API, model, and dependencies.You now have a practical workflow for turning a trained model into a distributable, containerized inference service, ready for further deployment steps. Remember to adapt the app.py
logic and requirements.txt
based on your specific model and its dependencies.
© 2025 ApX Machine Learning