Managing GPU Resources in a Kubernetes Cluster

While a standard Kubernetes cluster excels at managing CPU and memory, it doesn't recognize GPUs out of the box. To run accelerated machine learning workloads, you must explicitly enable the cluster to discover, manage, and assign these specialized hardware resources to your containers. This is accomplished using the Kubernetes Device Plugin framework, a system that allows third-party hardware vendors to advertise their resources to the Kubernetes scheduler.

For machine learning, the most common implementation is the NVIDIA Device Plugin. This plugin runs on each GPU-equipped node in your cluster, detects the available NVIDIA GPUs, and reports them as schedulable resources to the kubelet, the primary node agent for Kubernetes. Once the control plane is aware of these resources, you can request them directly in your pod specifications, just as you would for CPU or memory.

The Device Plugin Workflow

The process of exposing and scheduling a GPU involves several interacting components. The device plugin acts as a bridge between the physical hardware and the Kubernetes control plane.

Interaction between Kubernetes components for GPU scheduling. The device plugin advertises the GPU to the Kubelet, allowing the scheduler to place pods that request GPU resources onto the correct node.

Setting Up the Node

Before the plugin can work, each GPU node must have two essential components installed on the host operating system:

NVIDIA Drivers: The plugin does not install drivers. It relies on the host having the correct proprietary NVIDIA drivers to interact with the hardware.
NVIDIA Container Toolkit: This toolkit enables container runtimes like Docker or containerd to make GPUs accessible inside containers. It handles the low-level work of mounting the necessary driver files and device nodes from the host into the container's isolated environment.

Without these prerequisites, the device plugin may run, but your pods will fail to start or be unable to access the GPU at runtime.

Deploying the NVIDIA Device Plugin

The standard way to deploy the NVIDIA Device Plugin is by using a Kubernetes DaemonSet. A DaemonSet ensures that a copy of the plugin's pod runs on every node in the cluster (or on a subset of nodes you specify). When a new node with GPUs joins the cluster, the DaemonSet automatically deploys the plugin pod to it.

Here is a simplified YAML manifest for deploying the NVIDIA device plugin:

# nvidia-device-plugin-ds.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      # Allow the plugin to run on control-plane nodes if they have GPUs
      tolerations:
      - node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - image: nvcr.io/nvidia/k8s-device-plugin:v0.14.1
        name: nvidia-device-plugin-ctr
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        # Mount necessary host paths for the plugin to communicate with drivers and kubelet
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins

You can apply this manifest using kubectl apply -f nvidia-device-plugin-ds.yaml. Once deployed, you can verify that the plugin is running on your GPU-enabled nodes with kubectl get pods -n kube-system -o wide.

Requesting a GPU in a Pod

With the plugin running, your cluster now treats GPUs as a finite resource type identified by nvidia.com/gpu. To schedule a pod that uses a GPU, you must request it in the resources section of the container specification.

Let's look at a Pod manifest for a PyTorch application that requires one GPU:

# pytorch-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pytorch-gpu-pod
spec:
  containers:
  - name: pytorch-container
    image: pytorch/pytorch:latest-cuda
    # Command to keep the container running and test GPU access
    command: ["/bin/sh", "-c"]
    args: ["python -c 'import torch; print(f\"CUDA available: {torch.cuda.is_available()}\")'; sleep 3600"]
    resources:
      limits:
        # This is the resource request for one NVIDIA GPU
        nvidia.com/gpu: 1

When you apply this manifest, the Kubernetes scheduler will only assign pods to nodes that have at least one allocatable nvidia.com/gpu resource. If a suitable node is found, the scheduler assigns the pod to it. The kubelet on that node then allocates a specific GPU to the container before starting it.

Verifying GPU Allocation and Usage

You can confirm that Kubernetes is aware of your node's GPUs by running kubectl describe node <your-gpu-node-name>. Look for nvidia.com/gpu under the Capacity and Allocatable sections:

Capacity:
  cpu:                96
  ephemeral-storage:  99264Mi
  memory:             527699Mi
  nvidia.com/gpu:     4
  pods:               110
Allocatable:
  cpu:                96
  ephemeral-storage:  91497Mi
  memory:             517200Mi
  nvidia.com/gpu:     4
  pods:               110

After deploying your pod, you can confirm it is using the GPU by executing nvidia-smi inside the running container.

# Execute nvidia-smi inside the pod
$ kubectl exec -it pytorch-gpu-pod -- nvidia-smi

# Expected output showing the GPU and a running process
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   32C    P0    56W / 400W |      1MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
...

The presence of this output confirms that the container has successfully been granted access to the GPU, bridging the gap between your containerized ML application and the underlying hardware. This setup is fundamental for building scalable and efficient training and inference systems on Kubernetes.

Was this section helpful?

References

Extend Kubernetes with Device Plugins, Kubernetes Authors, 2024 - Official Kubernetes documentation detailing the Device Plugin framework, its architecture, and how it integrates specialized hardware with Kubernetes.
NVIDIA Container Toolkit Installation Guide, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official guide for installing and configuring the NVIDIA Container Toolkit, which is necessary for enabling GPU access within containers.
NVIDIA GPU Operator Overview, NVIDIA, 2024 (NVIDIA Corporation) - NVIDIA's comprehensive documentation for deploying and managing all NVIDIA software components, including the Device Plugin, for GPU provisioning in Kubernetes.