Building and containerizing a simple machine learning inference API, we'll take a pre-trained model, wrap it in a lightweight web application using Flask, and then create a Dockerfile to package everything into an efficient, runnable container.Prerequisites:A pre-trained machine learning model saved to a file. For this example, we'll assume you have a scikit-learn model saved as model.pkl. You can train a simple model (e.g., Logistic Regression on the Iris dataset) and save it using joblib or pickle.Basic familiarity with Python and Flask.Docker installed and running on your system.Project Structure:Organize your project files as follows:. ├── app.py # Flask application code ├── model.pkl # Your pre-trained model file ├── requirements.txt # Python dependencies └── Dockerfile # Instructions to build the Docker image1. Create the Inference API (app.py)This Python script will use Flask to create a web server. It loads our pre-trained model and exposes an endpoint (e.g., /predict) that accepts data, uses the model to make predictions, and returns the results.# app.py import joblib import numpy as np from flask import Flask, request, jsonify import os # Initialize Flask app app = Flask(__name__) # Load the pre-trained model # Ensure model.pkl is in the same directory or provide the correct path model_path = os.path.join(os.path.dirname(__file__), 'model.pkl') try: model = joblib.load(model_path) print("Model loaded successfully.") except FileNotFoundError: print(f"Error: Model file not found at {model_path}") model = None except Exception as e: print(f"Error loading model: {e}") model = None @app.route('/predict', methods=['POST']) def predict(): if model is None: return jsonify({'error': 'Model not loaded'}), 500 try: # Get data from POST request data = request.get_json(force=True) # Ensure data is in the expected format (e.g., list of features) # Adapt this part based on your model's input requirements if 'features' not in data: return jsonify({'error': 'Missing "features" key in JSON payload'}), 400 features = np.array(data['features']) # Perform prediction # Reshape if your model expects a 2D array (e.g., for a single sample) if features.ndim == 1: features = features.reshape(1, -1) prediction = model.predict(features) prediction_proba = None if hasattr(model, "predict_proba"): # Get probabilities if the model supports it prediction_proba = model.predict_proba(features).tolist() # Return prediction as JSON response response = { 'prediction': prediction.tolist(), } if prediction_proba: response['probabilities'] = prediction_proba return jsonify(response) except Exception as e: print(f"Prediction error: {e}") return jsonify({'error': f'An error occurred during prediction: {str(e)}'}), 500 @app.route('/health', methods=['GET']) def health_check(): # Simple health check endpoint # You could add checks here (e.g., model loaded) if model is not None: return jsonify({'status': 'ok'}), 200 else: return jsonify({'status': 'error', 'reason': 'Model not loaded'}), 500 if __name__ == '__main__': # Run the app using Flask's built-in server for development # For production, use a WSGI server like Gunicorn (configured in Dockerfile) app.run(host='0.0.0.0', port=5000, debug=False) # Set debug=False for production simulationThis script defines two routes:/predict: Accepts POST requests with JSON data containing a features key. It uses the loaded model.pkl to predict and returns the result./health: A simple GET endpoint to check if the service is running and the model loaded.2. Define Dependencies (requirements.txt)List the Python libraries needed for your API. For production, we include gunicorn, a WSGI server.# requirements.txt flask scikit-learn # Or the library used for your model (e.g., tensorflow, torch) numpy joblib # Or pickle if you used that for saving the model gunicorn # WSGI server for production3. Create the DockerfileThis file contains the instructions Docker uses to build your image. We'll start with a basic version and then show an optimized multi-stage build.Basic Dockerfile:# Dockerfile (Basic) # 1. Base Image: Start with an official Python runtime FROM python:3.9-slim # 2. Set Working Directory: Define the context for subsequent instructions WORKDIR /app # 3. Copy Dependencies File: Copy requirements first for layer caching COPY requirements.txt . # 4. Install Dependencies: Install Python packages # --no-cache-dir: Reduces image size by not storing the pip cache # --upgrade pip: Ensure pip is up-to-date RUN pip install --no-cache-dir --upgrade pip && \ pip install --no-cache-dir -r requirements.txt # 5. Copy Application Code and Model: Copy the rest of your application COPY . . # 6. Expose Port: Inform Docker the container listens on port 5000 EXPOSE 5000 # 7. Define Run Command: Specify command to run the application using Gunicorn # --bind 0.0.0.0:5000: Listen on all network interfaces inside the container # app:app: Refers to the Flask app object named 'app' within the 'app.py' file CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]Explanation:FROM python:3.9-slim: Uses a lightweight official Python image.WORKDIR /app: Sets the current directory inside the container to /app.COPY requirements.txt .: Copies only the requirements file first. Docker caches layers, so if requirements.txt doesn't change, the pip install layer won't need to be rebuilt, speeding up subsequent builds.RUN pip install...: Installs the dependencies using pip. --no-cache-dir prevents pip from storing downloaded packages, saving space.COPY . .: Copies the rest of the project files ( app.py, model.pkl) into the /app directory in the container.EXPOSE 5000: Documents that the application inside the container will listen on port 5000. This doesn't publish the port; it's metadata.CMD ["gunicorn", ...]: Specifies the default command to run when the container starts. We use Gunicorn for a production-ready server, binding it to 0.0.0.0 so it's accessible from outside the container (when mapped) and specifying our Flask application object (app inside app.py).4. Build the Docker ImageOpen your terminal in the project directory (where the Dockerfile is) and run the build command:docker build -t ml-inference-api:basic .docker build: The command to build an image.-t ml-inference-api:basic: Tags the image with a name (ml-inference-api) and a tag (basic)..: Specifies the build context (the current directory). Docker sends files from this directory to the Docker daemon.5. Run the ContainerNow, run a container from the image you just built:docker run -d -p 5001:5000 --name my_api ml-inference-api:basicdocker run: The command to create and start a container.-d: Runs the container in detached mode (in the background).-p 5001:5000: Maps port 5001 on your host machine to port 5000 inside the container. This allows you to access the API via localhost:5001.--name my_api: Assigns a name to the running container for easier management.ml-inference-api:basic: The image to run.You can check if the container is running using docker ps.6. Test the APIYou can test the running API using curl or a simple Python script. Assuming your model expects 4 features:Using curl from your terminal:curl -X POST http://localhost:5001/predict \ -H "Content-Type: application/json" \ -d '{"features": [5.1, 3.5, 1.4, 0.2]}' # Example Iris features # Expected output (will vary based on your model): # {"prediction":[0]} or {"prediction":[0], "probabilities":[[0.98, 0.01, 0.01]]}Check the health endpoint:curl http://localhost:5001/health # Expected output: # {"status":"ok"}7. Optimizing with Multi-Stage BuildsThe basic image includes build tools and potentially large libraries not strictly needed for running the inference service. A multi-stage build creates smaller, more secure production images.Optimized Dockerfile (Multi-Stage):# Dockerfile (Multi-Stage) # ---- Build Stage ---- # Use a full Python image to install dependencies, including any build-time needs FROM python:3.9 as builder WORKDIR /build # Install build essentials if needed (e.g., for C extensions) # RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/* COPY requirements.txt . # Install dependencies into a specific directory RUN pip install --no-cache-dir --upgrade pip && \ pip install --no-cache-dir --prefix="/install" -r requirements.txt # ---- Runtime Stage ---- # Use a minimal base image for the final runtime environment FROM python:3.9-slim WORKDIR /app # Copy only the installed packages from the build stage COPY --from=builder /install /usr/local # Copy application code and model COPY app.py . COPY model.pkl . # Expose port EXPOSE 5000 # Define run command (same as before) CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]Explanation of Changes:FROM python:3.9 as builder: Defines the first stage named builder. This stage can use a larger image if needed for compilation.pip install --prefix="/install": Installs packages into a specific directory (/install) within the builder stage, rather than the default system Python location.FROM python:3.9-slim: Starts a new, clean, minimal base image for the final runtime stage.COPY --from=builder /install /usr/local: This is the significant part. It copies only the contents of the /install directory (where our dependencies were installed) from the builder stage into the final image's standard library location (/usr/local). Build tools and caches from the builder stage are discarded.COPY app.py ., COPY model.pkl .: Copies the necessary application files.Build and Compare:Build the optimized image:docker build -t ml-inference-api:optimized .Compare image sizes:docker images ml-inference-apiYou should see that ml-inference-api:optimized is significantly smaller than ml-inference-api:basic. Run it just like the basic version (adjusting port if needed):# Stop and remove the previous container first if running docker stop my_api docker rm my_api # Run the optimized version docker run -d -p 5001:5000 --name my_api_optimized ml-inference-api:optimizedTest it again using curl as before.Summary:In this hands-on exercise, you successfully:Created a simple Flask API to serve predictions from a pre-trained model.Defined Python dependencies in requirements.txt.Wrote a Dockerfile to package the API, model, and dependencies.Built a basic Docker image containing the inference service.Implemented a multi-stage build to create a significantly smaller, optimized production image.Ran the containerized API and tested its functionality.You now have a practical workflow for turning a trained model into a distributable, containerized inference service, ready for further deployment steps. Remember to adapt the app.py logic and requirements.txt based on your specific model and its dependencies.