Kubernetes Components: Pods, Services, Deployments

Kubernetes is a system for managing containerized applications. To define and run machine learning workloads, understanding its fundamental objects is important. Although Kubernetes includes many components, interaction will primarily involve three objects: Pods, Services, and Deployments. These three objects provide the foundation for building scalable and resilient machine learning systems on the platform.

Pods: The Atomic Unit of Deployment

In the Kubernetes environment, the smallest and most basic deployable object is not a container, but a Pod. A Pod represents a single instance of a running process in your cluster and encapsulates one or more containers. It also includes shared storage and network resources for those containers.

While you can run multiple containers inside a single Pod, the common practice is to have one main application container per Pod. This "one-process-per-container" approach keeps your architecture clean. So, when would you use multiple containers? A typical use case in ML is the "sidecar" pattern. Imagine you have a container serving your model. You can add a second, "sidecar" container to the same Pod that handles a secondary task, like fetching model updates from cloud storage or shipping logs to a central collector. Because containers in a Pod share the same network namespace and can share storage volumes, they can communicate with each other efficiently.

A Pod containing a primary application container and a sidecar for a supporting task.

An important characteristic of Pods is that they are ephemeral. They can be terminated due to a node failure, resource constraints, or during an application update. When a Pod is destroyed, it is gone for good, along with its unique IP address. This transient nature means you should never create and manage individual Pods directly for any application that needs to be reliable. Instead, you use a higher-level controller, like a Deployment, to manage them for you.

Deployments: Managing Application Lifecycles

A Deployment is a controller that provides declarative updates for Pods. You describe the desired state in a Deployment object, and the Deployment controller works to change the actual state to the desired state at a controlled rate.

Its primary functions are:

Managing Replicas: You specify how many identical copies, or replicas, of your Pod you want running. The Deployment ensures that this number is always maintained. If a node hosting one of your Pods goes down, the Deployment will automatically schedule a replacement Pod on another healthy node in the cluster. This is the self-healing mechanism that makes your ML services resilient.
Handling Updates: When you need to update your application, for instance, to deploy a new model version or a new version of your API code, you simply update the container image specified in your Deployment. The Deployment will manage a rolling update, gradually replacing old Pods with new ones, ensuring there is no service downtime.

Under the hood, a Deployment manages a ReplicaSet, which is the object responsible for maintaining the specified number of replicas. You almost never interact with ReplicaSets directly, as Deployments orchestrate them to handle versioning and updates.

Here is a simplified example of a Deployment manifest in YAML. This file declaratively defines a desired state: three replicas of a model-serving application.

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference-api-deployment
spec:
  replicas: 3 # 1. We want 3 identical Pods running
  selector:
    matchLabels:
      app: inference-api # 2. The Deployment finds Pods with this label
  template: # 3. This is the blueprint for the Pods it creates
    metadata:
      labels:
        app: inference-api # 4. The Pods get this label
    spec:
      containers:
      - name: model-server
        image: your-repo/your-model-server:v1.2 # 5. The container image to run
        ports:
        - containerPort: 8000

replicas: Tells the Deployment to maintain three running instances of our Pod.
selector: Defines how the Deployment finds which Pods to manage. It looks for Pods with a label that matches app: inference-api.
template: A blueprint for the Pods that the Deployment will create. It contains its own metadata and spec.
template.metadata.labels: The labels applied to each Pod created by this Deployment. The selector must match these labels.
template.spec.containers: The list of containers to run inside the Pod. Here, we define a single container named model-server using a specific image version.

Services: Exposing Your Application

We've established that Deployments manage a set of identical, yet ephemeral, Pods. Each Pod has its own internal IP address that can change whenever it is recreated. This presents a problem: how can other applications, or external users, reliably connect to your model-serving Pods if their addresses are constantly changing?

This is where the Service object comes in. A Service provides a stable, abstract endpoint for a set of Pods. It gets a persistent IP address and a DNS name within the cluster. When traffic is sent to the Service, it acts as an internal load balancer, automatically routing the request to one of the healthy Pods that it targets. A Service finds its target Pods using the same label and selector mechanism as a Deployment.

Kubernetes offers several types of Services, but for ML applications, you will most often use these two:

ClusterIP: This is the default type. The Service gets an IP address that is only reachable from within the Kubernetes cluster. This is perfect for internal communication. For example, a web frontend Pod could communicate with a backend inference Pod through a ClusterIP Service.
LoadBalancer: This type exposes the Service externally using a cloud provider's load balancer. When you create a LoadBalancer Service on a cloud platform like AWS, GCP, or Azure, Kubernetes will automatically provision an external load balancer and configure it to route external internet traffic to your Service's Pods. This is the standard way to expose a production inference API to the public.

The diagram below illustrates how these components work together. A Deployment ensures three Pods are running. A LoadBalancer Service provides a single, stable entry point for external traffic and distributes requests among the Pods.

An external request is sent to a stable LoadBalancer IP, which the Service routes to one of the available Pods matching its label selector. The Deployment ensures the desired number of Pods are always running.

Together, Deployments and Services form the backbone of a scalable application on Kubernetes. The Deployment handles the lifecycle and availability of your model-serving Pods, while the Service provides a stable network endpoint to make them accessible.

Was this section helpful?

References

Pods, Kubernetes Authors, 2024 - The official documentation explaining the smallest deployable unit in Kubernetes, its lifecycle, and multi-container patterns like sidecars.
Deployments, Kubernetes Authors, 2024 - The official documentation for Kubernetes Deployments, detailing how to manage declarative updates for Pods, handle rolling updates, and use ReplicaSets.
Services, Load Balancing, and Networking, Kubernetes Authors, 2024 - The official documentation describing Kubernetes Service objects, stable network endpoints, various service types (e.g., ClusterIP, LoadBalancer), and their role in enabling Pod communication.
Kubernetes in Action, Second Edition, Marko Lukša, Kevin Conner, 2026 (Manning Publications) - A highly-regarded book offering a detailed and practical guide to Kubernetes, including in-depth explanations of its core components such as Pods, Deployments, and Services.