This practical exercise guides you through a complete workflow: packaging a simple model-serving application into a Docker container and then deploying it on a local Kubernetes cluster. You will write the necessary configuration files for both Docker and Kubernetes, use kubectl to manage the application, and finally, test the live endpoint. This process mirrors a common pattern for moving ML services towards a production environment.
Before you begin, ensure you have the following tools installed and configured on your machine:
kubectl version --client.We will use a simple Flask web application that serves predictions from a pre-trained Scikit-learn model. The model is a classic classifier trained on the Iris dataset. The goal is to keep the application simple to focus on the deployment infrastructure.
First, create a new directory for this project called iris-k8s-app. Inside this directory, create the following three files.
1. model.joblib
You don't need to create this file yourself. For this exercise, we assume you have a pre-trained model saved. Its specific contents are not important, only that our Python script can load it.
2. requirements.txt
This file lists the Python libraries our application needs.
flask==2.2.2
scikit-learn==1.1.2
joblib==1.2.0
numpy==1.23.4
3. app.py
This is our Python script. It loads the model and creates a Flask web server with a single /predict endpoint that accepts POST requests with flower measurement data and returns a prediction.
import joblib
import numpy as np
from flask import Flask, request, jsonify
# Initialize the Flask application
app = Flask(__name__)
# Load the pre-trained model
# In a real scenario, you would have this model file.
# For this lab, we will simulate its presence and loading.
# model = joblib.load('model.joblib')
@app.route('/predict', methods=['POST'])
def predict():
try:
# Get the JSON data from the request
data = request.get_json(force=True)
# Extract features and convert to a numpy array for the model
# Expecting a list of 4 floats: [sepal_length, sepal_width, petal_length, petal_width]
features = np.array(data['features']).reshape(1, -1)
# A mock prediction since we don't have the actual model file.
# In a real run, this would be: prediction = model.predict(features)
# We will simulate a prediction based on petal length (feature index 2)
if features[0, 2] < 2.5:
prediction_result = 'setosa'
else:
prediction_result = 'versicolor_or_virginica'
# Return the prediction as JSON
return jsonify({'prediction': prediction_result})
except Exception as e:
return jsonify({'error': str(e)}), 400
if __name__ == '__main__':
# Run the app on host 0.0.0.0 to make it accessible from the container
app.run(host='0.0.0.0', port=5000)
Note on the Model: The Python code includes a
joblib.loadline that is commented out and replaced with mock logic. This simplifies the lab by removing the need to download or train a model, allowing us to focus entirely on the containerization and orchestration steps.
The first step is to create a Dockerfile that defines how to build an image containing our application and its dependencies.
Create a file named Dockerfile in your project directory with the following content:
# Start from a slim Python base image
FROM python:3.9-slim
# Set the working directory inside the container
WORKDIR /app
# Copy the requirements file into the container
COPY requirements.txt .
# Install the Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application code into the container
COPY . .
# Expose the port the app runs on
EXPOSE 5000
# Define the command to run the application
CMD ["python", "app.py"]
Now, open your terminal in the iris-k8s-app directory and run the following command to build the image. We will tag it iris-app:v1.
docker build -t iris-app:v1 .
After the build completes, you can verify it by running the container locally.
docker run -p 5000:5000 iris-app:v1
You should see output from Flask indicating the server is running. You can stop it with Ctrl+C.
A Kubernetes Deployment is a resource object that manages a set of identical Pods. It ensures that a specified number of application replicas are running and handles updates or rollbacks.
Create a file named deployment.yaml with the following content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: iris-deployment
spec:
replicas: 2
selector:
matchLabels:
app: iris-server
template:
metadata:
labels:
app: iris-server
spec:
containers:
- name: iris-app-container
image: iris-app:v1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5000
Let's break down this file:
replicas: 2: Instructs Kubernetes to maintain two running instances (Pods) of our application for availability.selector: This tells the Deployment which Pods to manage. It finds Pods with the label app: iris-server.template: This section defines the Pod that will be created. It includes:
metadata.labels: A label app: iris-server is attached to each Pod. This is how the selector finds them.spec.containers: Defines the container(s) to run inside the Pod.image: iris-app:v1: Specifies the Docker image to use.imagePullPolicy: IfNotPresent: Tells Kubernetes to use the local image if it exists, rather than trying to pull it from a remote registry. This is useful for local development.containerPort: 5000: Informs Kubernetes that the container listens on port 5000.Pods in Kubernetes are ephemeral and have internal IP addresses that can change. To provide a stable network endpoint for accessing our application, we use a Service.
We will use the NodePort service type, which exposes the application on a static port on each node in the cluster. This is a straightforward method for accessing an application during development.
Create a file named service.yaml:
apiVersion: v1
kind: Service
metadata:
name: iris-service
spec:
type: NodePort
selector:
app: iris-server
ports:
- protocol: TCP
port: 80
targetPort: 5000
Here's what this file defines:
type: NodePort: Exposes this service on the IP of the cluster nodes at a specific port.selector: This is the important link. The service will route traffic to any Pod that has the label app: iris-server, which matches the Pods created by our Deployment.ports: This section maps network ports. It states that the service will accept traffic on port 80 and forward it to targetPort: 5000 on the selected Pods.The relationship between the Kubernetes objects we created. A user sends a request to the stable Service, which then forwards the traffic to one of the Pods managed by the Deployment.
With the image built and the manifest files written, you can now deploy the application. Run the kubectl apply command for each file.
# Apply the deployment configuration
kubectl apply -f deployment.yaml
# Apply the service configuration
kubectl apply -f service.yaml
You can check the status of your resources:
# Check that the deployment is progressing
kubectl get deployment
# Expected output:
# NAME READY UP-TO-DATE AVAILABLE AGE
# iris-deployment 2/2 2 2 15s
# Check that two pods are running
kubectl get pods
# Expected output:
# NAME READY STATUS RESTARTS AGE
# iris-deployment-5c68f6d787-abcde 1/1 Running 0 25s
# iris-deployment-5c68f6d787-fghij 1/1 Running 0 25s
# Check that the service is created
kubectl get service iris-service
# Expected output:
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# iris-service NodePort 10.101.102.103 <none> 80:31234/TCP 35s
Pay attention to the PORT(S) column for the service. The cluster has assigned a high-numbered port (e.g., 31234) which is mapped to the service's internal port 80. This is the port you will use to access the application.
To test the service, you need the IP address of your cluster node and the NodePort. For local clusters, the IP is usually localhost or can be found with minikube ip.
If using Minikube, you can get the full URL with one command:
minikube service iris-service --url
This will output a URL like http://192.168.49.2:31234.
Now, use curl to send a POST request with some sample data to your service's /predict endpoint. Use the URL you obtained in the previous step.
curl -X POST \
-H "Content-Type: application/json" \
-d '{"features": [5.1, 3.5, 1.4, 0.2]}' \
http://<YOUR_NODE_IP>:<YOUR_NODE_PORT>/predict
You should receive a JSON response from the model running inside one of your Pods:
{"prediction":"setosa"}
Congratulations, you have successfully containerized a machine learning application and deployed it on Kubernetes.
To remove the resources from your cluster, use the kubectl delete command with the same manifest files.
kubectl delete -f service.yaml
kubectl delete -f deployment.yaml
This lab demonstrates the fundamental pattern for deploying ML models as scalable, managed services. By defining your application declaratively in YAML files, you can easily replicate, scale, and manage your deployments in any Kubernetes environment.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with