While a standard Kubernetes cluster excels at managing CPU and memory, it doesn't recognize GPUs out of the box. To run accelerated machine learning workloads, you must explicitly enable the cluster to discover, manage, and assign these specialized hardware resources to your containers. This is accomplished using the Kubernetes Device Plugin framework, a system that allows third-party hardware vendors to advertise their resources to the Kubernetes scheduler.
For machine learning, the most common implementation is the NVIDIA Device Plugin. This plugin runs on each GPU-equipped node in your cluster, detects the available NVIDIA GPUs, and reports them as schedulable resources to the kubelet, the primary node agent for Kubernetes. Once the control plane is aware of these resources, you can request them directly in your pod specifications, just as you would for CPU or memory.
The process of exposing and scheduling a GPU involves several interacting components. The device plugin acts as a bridge between the physical hardware and the Kubernetes control plane.
Interaction between Kubernetes components for GPU scheduling. The device plugin advertises the GPU to the Kubelet, allowing the scheduler to place pods that request GPU resources onto the correct node.
Before the plugin can work, each GPU node must have two essential components installed on the host operating system:
Without these prerequisites, the device plugin may run, but your pods will fail to start or be unable to access the GPU at runtime.
The standard way to deploy the NVIDIA Device Plugin is by using a Kubernetes DaemonSet. A DaemonSet ensures that a copy of the plugin's pod runs on every node in the cluster (or on a subset of nodes you specify). When a new node with GPUs joins the cluster, the DaemonSet automatically deploys the plugin pod to it.
Here is a simplified YAML manifest for deploying the NVIDIA device plugin:
# nvidia-device-plugin-ds.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
template:
metadata:
labels:
name: nvidia-device-plugin-ds
spec:
# Allow the plugin to run on control-plane nodes if they have GPUs
tolerations:
- node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.14.1
name: nvidia-device-plugin-ctr
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
# Mount necessary host paths for the plugin to communicate with drivers and kubelet
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
You can apply this manifest using kubectl apply -f nvidia-device-plugin-ds.yaml. Once deployed, you can verify that the plugin is running on your GPU-enabled nodes with kubectl get pods -n kube-system -o wide.
With the plugin running, your cluster now treats GPUs as a finite resource type identified by nvidia.com/gpu. To schedule a pod that uses a GPU, you must request it in the resources section of the container specification.
Let's look at a Pod manifest for a PyTorch application that requires one GPU:
# pytorch-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: pytorch-gpu-pod
spec:
containers:
- name: pytorch-container
image: pytorch/pytorch:latest-cuda
# Command to keep the container running and test GPU access
command: ["/bin/sh", "-c"]
args: ["python -c 'import torch; print(f\"CUDA available: {torch.cuda.is_available()}\")'; sleep 3600"]
resources:
limits:
# This is the resource request for one NVIDIA GPU
nvidia.com/gpu: 1
When you apply this manifest, the Kubernetes scheduler will only consider nodes that have at least one allocatable nvidia.com/gpu resource. If a suitable node is found, the scheduler assigns the pod to it. The kubelet on that node then allocates a specific GPU to the container before starting it.
You can confirm that Kubernetes is aware of your node's GPUs by running kubectl describe node <your-gpu-node-name>. Look for nvidia.com/gpu under the Capacity and Allocatable sections:
Capacity:
cpu: 96
ephemeral-storage: 99264Mi
memory: 527699Mi
nvidia.com/gpu: 4
pods: 110
Allocatable:
cpu: 96
ephemeral-storage: 91497Mi
memory: 517200Mi
nvidia.com/gpu: 4
pods: 110
After deploying your pod, you can confirm it is using the GPU by executing nvidia-smi inside the running container.
# Execute nvidia-smi inside the pod
$ kubectl exec -it pytorch-gpu-pod -- nvidia-smi
# Expected output showing the GPU and a running process
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:07:00.0 Off | 0 |
| N/A 32C P0 56W / 400W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
...
The presence of this output confirms that the container has successfully been granted access to the GPU, bridging the gap between your containerized ML application and the underlying hardware. This setup is fundamental for building scalable and efficient training and inference systems on Kubernetes.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with