Hands-on Practical: Deploying a Model on Kubernetes

This practical exercise guides you through a complete workflow: packaging a simple model-serving application into a Docker container and then deploying it on a local Kubernetes cluster. You will write the necessary configuration files for both Docker and Kubernetes, use kubectl to manage the application, and finally, test the live endpoint. This process mirrors a common pattern for moving ML services towards a production environment.

Prerequisites

Before you begin, ensure you have the following tools installed and configured on your machine:

Docker Desktop or Docker Engine: To build the container image.
A local Kubernetes cluster: The easiest way to get one is by enabling the cluster included with Docker Desktop. Alternatively, you can install Minikube.
kubectl: The Kubernetes command-line tool. It is typically installed with Docker Desktop or Minikube. You can verify its installation by running kubectl version --client.

The Model Serving Application

We will use a simple Flask web application that serves predictions from a pre-trained Scikit-learn model. The model is a classic classifier trained on the Iris dataset. The goal is to keep the application simple to focus on the deployment infrastructure.

First, create a new directory for this project called iris-k8s-app. Inside this directory, create the following three files.

1. model.joblib

You don't need to create this file yourself. For this exercise, we assume you have a pre-trained model saved. Its specific contents are not important, only that our Python script can load it.

2. requirements.txt

This file lists the Python libraries our application needs.

flask==2.2.2
scikit-learn==1.1.2
joblib==1.2.0
numpy==1.23.4

3. app.py

This is our Python script. It loads the model and creates a Flask web server with a single /predict endpoint that accepts POST requests with flower measurement data and returns a prediction.

import joblib
import numpy as np
from flask import Flask, request, jsonify

# Initialize the Flask application
app = Flask(__name__)

# Load the pre-trained model
# In a real scenario, you would have this model file.
# For this lab, we will simulate its presence and loading.
# model = joblib.load('model.joblib')

@app.route('/predict', methods=['POST'])
def predict():
    try:
        # Get the JSON data from the request
        data = request.get_json(force=True)

        # Extract features and convert to a numpy array for the model
        # Expecting a list of 4 floats: [sepal_length, sepal_width, petal_length, petal_width]
        features = np.array(data['features']).reshape(1, -1)

        # A mock prediction since we don't have the actual model file.
        # In a real run, this would be: prediction = model.predict(features)
        # We will simulate a prediction based on petal length (feature index 2)
        if features[0, 2] < 2.5:
            prediction_result = 'setosa'
        else:
            prediction_result = 'versicolor_or_virginica'

        # Return the prediction as JSON
        return jsonify({'prediction': prediction_result})

    except Exception as e:
        return jsonify({'error': str(e)}), 400

if __name__ == '__main__':
    # Run the app on host 0.0.0.0 to make it accessible from the container
    app.run(host='0.0.0.0', port=5000)

Note on the Model: The Python code includes a joblib.load line that is commented out and replaced with mock logic. This simplifies the lab by removing the need to download or train a model, allowing us to focus entirely on the containerization and orchestration steps.

Step 1: Containerize the Application with a Dockerfile

The first step is to create a Dockerfile that defines how to build an image containing our application and its dependencies.

Create a file named Dockerfile in your project directory with the following content:

# Start from a slim Python base image
FROM python:3.9-slim

# Set the working directory inside the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install the Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY . .

# Expose the port the app runs on
EXPOSE 5000

# Define the command to run the application
CMD ["python", "app.py"]

Building the Docker Image

Now, open your terminal in the iris-k8s-app directory and run the following command to build the image. We will tag it iris-app:v1.

docker build -t iris-app:v1 .

After the build completes, you can verify it by running the container locally.

docker run -p 5000:5000 iris-app:v1

You should see output from Flask indicating the server is running. You can stop it with Ctrl+C.

Step 2: Define the Kubernetes Deployment

A Kubernetes Deployment is a resource object that manages a set of identical Pods. It ensures that a specified number of application replicas are running and handles updates or rollbacks.

Create a file named deployment.yaml with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: iris-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: iris-server
  template:
    metadata:
      labels:
        app: iris-server
    spec:
      containers:
      - name: iris-app-container
        image: iris-app:v1
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5000

Let's break down this file:

replicas: 2: Instructs Kubernetes to maintain two running instances (Pods) of our application for availability.
selector: This tells the Deployment which Pods to manage. It finds Pods with the label app: iris-server.
template: This section defines the Pod that will be created. It includes:
- metadata.labels: A label app: iris-server is attached to each Pod. This is how the selector finds them.
- spec.containers: Defines the container(s) to run inside the Pod.
- image: iris-app:v1: Specifies the Docker image to use.
- imagePullPolicy: IfNotPresent: Tells Kubernetes to use the local image if it exists, rather than trying to pull it from a remote registry. This is useful for local development.
- containerPort: 5000: Informs Kubernetes that the container listens on port 5000.

Step 3: Expose the Deployment with a Service

Pods in Kubernetes are ephemeral and have internal IP addresses that can change. To provide a stable network endpoint for accessing our application, we use a Service.

We will use the NodePort service type, which exposes the application on a static port on each node in the cluster. This is a straightforward method for accessing an application during development.

Create a file named service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: iris-service
spec:
  type: NodePort
  selector:
    app: iris-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000

Here's what this file defines:

type: NodePort: Exposes this service on the IP of the cluster nodes at a specific port.
selector: This is the important link. The service will route traffic to any Pod that has the label app: iris-server, which matches the Pods created by our Deployment.
ports: This section maps network ports. It states that the service will accept traffic on port 80 and forward it to targetPort: 5000 on the selected Pods.

The relationship between the Kubernetes objects we created. A user sends a request to the stable Service, which then forwards the traffic to one of the Pods managed by the Deployment.

Step 4: Apply the Manifests to the Cluster

With the image built and the manifest files written, you can now deploy the application. Run the kubectl apply command for each file.

# Apply the deployment configuration
kubectl apply -f deployment.yaml

# Apply the service configuration
kubectl apply -f service.yaml

You can check the status of your resources:

# Check that the deployment is progressing
kubectl get deployment

# Expected output:
# NAME              READY   UP-TO-DATE   AVAILABLE   AGE
# iris-deployment   2/2     2            2           15s

# Check that two pods are running
kubectl get pods

# Expected output:
# NAME                               READY   STATUS    RESTARTS   AGE
# iris-deployment-5c68f6d787-abcde   1/1     Running   0          25s
# iris-deployment-5c68f6d787-fghij   1/1     Running   0          25s

# Check that the service is created
kubectl get service iris-service

# Expected output:
# NAME           TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
# iris-service   NodePort   10.101.102.103  <none>        80:31234/TCP   35s

Pay attention to the PORT(S) column for the service. The cluster has assigned a high-numbered port (e.g., 31234) which is mapped to the service's internal port 80. This is the port you will use to access the application.

Step 5: Test the Deployed Model

To test the service, you need the IP address of your cluster node and the NodePort. For local clusters, the IP is usually localhost or can be found with minikube ip.

If using Minikube, you can get the full URL with one command:

minikube service iris-service --url

This will output a URL like http://192.168.49.2:31234.

Now, use curl to send a POST request with some sample data to your service's /predict endpoint. Use the URL you obtained in the previous step.

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"features": [5.1, 3.5, 1.4, 0.2]}' \
  http://<YOUR_NODE_IP>:<YOUR_NODE_PORT>/predict

You should receive a JSON response from the model running inside one of your Pods:

{"prediction":"setosa"}

Congratulations, you have successfully containerized a machine learning application and deployed it on Kubernetes.

Cleaning Up

To remove the resources from your cluster, use the kubectl delete command with the same manifest files.

kubectl delete -f service.yaml
kubectl delete -f deployment.yaml

This lab demonstrates the fundamental pattern for deploying ML models as scalable, managed services. By defining your application declaratively in YAML files, you can easily replicate, scale, and manage your deployments in any Kubernetes environment.

Was this section helpful?

References

Kubernetes Documentation, Kubernetes Authors, 2024 (Cloud Native Computing Foundation (CNCF)) - A comprehensive resource for understanding Kubernetes concepts, architecture, and usage, directly relevant to the deployment steps described in the section.
Docker Documentation, Docker, Inc., 2024 (Docker, Inc.) - An essential guide for Docker containerization, building images with Dockerfiles, and managing containers, directly supporting the initial packaging of the application.
Kubernetes in Action, Second Edition, Marko Lukša, Kevin Conner, 2026 (Manning Publications) - This book provides a thorough introduction to Kubernetes, covering core concepts such as Pods, Deployments, and Services in detail, offering deeper insights into the orchestration principles.
Practical MLOps: How to Take Your Machine Learning Models to Production, Noah Gift, Alfredo Deza, 2021 (O'Reilly Media) - Covers the full lifecycle of MLOps, including model deployment strategies on platforms like Kubernetes, which helps readers situate this practical exercise within a broader productionization context.