As an ML platform scales to serve multiple teams and projects, contention for valuable resources like GPUs becomes inevitable. A production system cannot operate on a first-come, first-served basis. It requires mechanisms to enforce isolation, guarantee fair resource access, and prioritize important workloads. Kubernetes provides a powerful set of primitives to build a sophisticated multi-tenant environment: Namespaces for isolation, ResourceQuotas for capacity management, and PriorityClasses for scheduling precedence.
The first step in creating a multi-tenant cluster is logical partitioning. A Kubernetes Namespace provides a scope for resources, effectively creating a virtual cluster within your physical cluster. This prevents one team from accidentally interfering with another's work; for example, two teams can each deploy a training job named resnet-finetune without any conflict because they exist in separate namespaces.
Beyond name scoping, namespaces are the fundamental unit for security and access control. Using Role-Based Access Control (RBAC), you can define permissions on a per-namespace basis. This allows you to grant a data science team full control over their own development namespace while giving them read-only access to a shared production namespace.
Teams are isolated within their own namespaces, preventing resource name collisions and enabling fine-grained access control.
Consider a Role that grants permissions to manage common ML workload resources. This role, defined within the team-alpha namespace, only applies to that specific namespace.
# role-team-alpha.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: team-alpha
name: ml-developer
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["pods", "deployments", "jobs", "services"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
You then bind this role to a specific user or group using a RoleBinding.
# rolebinding-team-alpha.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: team-alpha-binding
namespace: team-alpha
subjects:
- kind: Group
name: "data-science-alpha" # Group name from your identity provider
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: ml-developer
apiGroup: rbac.authorization.k8s.io
With this configuration, members of the data-science-alpha group can manage jobs in their namespace but cannot see or affect any resources in the team-beta namespace.
Namespaces provide logical separation, but they do not prevent one team from consuming all available physical resources. A single user in team-alpha could inadvertently request all available GPUs, starving team-beta and other users. This is where ResourceQuota objects are essential.
A ResourceQuota sets hard limits on the total amount of compute and storage resources that can be consumed by all pods within a namespace. For ML platforms, the most important resources to constrain are CPU, memory, and specialized hardware like GPUs.
Here is an example ResourceQuota for a development namespace, capping its total consumption:
# quota-team-alpha.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "20" # Max 20 CPU cores requested in total
requests.memory: "100Gi" # Max 100 GiB of RAM requested in total
limits.cpu: "40" # Max 40 CPU cores as hard limits
limits.memory: "200Gi" # Max 200 GiB of RAM as hard limits
requests.nvidia.com/gpu: "4" # Max 4 GPUs requested in total
pods: "10" # Max 10 pods running at once
When a user in team-alpha tries to create a pod that would exceed these aggregate limits, the Kubernetes API server will reject the request. This ensures that no single namespace can monopolize the cluster's resources, guaranteeing a fair share for all tenants.
A related object, LimitRange, can be used to enforce resource requests and limits on individual pods within a namespace. This is a good practice to prevent users from creating pods without any resource declarations, which can make scheduling difficult and inefficient.
With quotas in place, you have fair sharing. But not all workloads are created equal. A final model training job for a production release is more important than an experimental data exploration notebook. When the cluster is at capacity, you need a way to ensure high-priority jobs can run, even if it means interrupting less important ones. This is accomplished with PriorityClass.
A PriorityClass is a cluster-scoped object that defines a named priority level. Pods can then reference a PriorityClass by name. The Kubernetes scheduler uses this information in two ways:
Let's define a few priority levels for our ML platform:
# priorityclasses.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: critical-prod
value: 1000000
globalDefault: false
description: "Used for critical production training and serving workloads."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority-research
value: 500000
globalDefault: false
description: "Used for important research experiments and validation runs."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority-dev
value: 100000
globalDefault: true # Make this the default if none is specified
description: "Used for development, notebooks, and other non-urgent tasks."
Now, a user submitting a production training job can specify the critical-prod priority in their pod specification:
# prod-training-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: prod-training-job-1
namespace: team-alpha
spec:
priorityClassName: critical-prod # <-- Reference the PriorityClass
containers:
- name: training-container
image: pytorch/pytorch:latest
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
If the cluster has no available GPUs, the scheduler will identify a node running a pod with a lower priority (e.g., a low-priority-dev pod) and evict it. The preempted pod is gracefully terminated, allowing the critical-prod pod to be scheduled in its place.
A high-priority pod in the pending queue triggers the preemption of a running low-priority pod to free up necessary GPU resources.
By combining namespaces for isolation, resource quotas for fairness, and priority classes for preemption, you can transform a chaotic, contended cluster into an orderly and efficient multi-tenant ML platform. This structured approach not only improves resource utilization but also provides the predictability and control required for production machine learning operations.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with