Multi-Tenancy with Namespaces, Quotas, and Priority Classes

As an ML platform scales to serve multiple teams and projects, contention for valuable resources like GPUs becomes inevitable. A production system cannot operate on a first-come, first-served basis. It requires mechanisms to enforce isolation, guarantee fair resource access, and prioritize important workloads. Kubernetes provides a powerful set of primitives to build a sophisticated multi-tenant environment: Namespaces for isolation, ResourceQuotas for capacity management, and PriorityClasses for scheduling precedence.

Logical Isolation with Namespaces

The first step in creating a multi-tenant cluster is logical partitioning. A Kubernetes Namespace provides a scope for resources, effectively creating a virtual cluster within your physical cluster. This prevents one team from accidentally interfering with another's work; for example, two teams can each deploy a training job named resnet-finetune without any conflict because they exist in separate namespaces.

Beyond name scoping, namespaces are the fundamental unit for security and access control. Using Role-Based Access Control (RBAC), you can define permissions on a per-namespace basis. This allows you to grant a data science team full control over their own development namespace while giving them read-only access to a shared production namespace.

Teams are isolated within their own namespaces, preventing resource name collisions and enabling fine-grained access control.

For example, a Role that grants permissions to manage common ML workload resources. This role, defined within the team-alpha namespace, only applies to that specific namespace.

# role-team-alpha.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: team-alpha
  name: ml-developer
rules:
- apiGroups: ["", "apps", "batch"]
  resources: ["pods", "deployments", "jobs", "services"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get", "list"]

You then bind this role to a specific user or group using a RoleBinding.

# rolebinding-team-alpha.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-binding
  namespace: team-alpha
subjects:
- kind: Group
  name: "data-science-alpha" # Group name from your identity provider
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: ml-developer
  apiGroup: rbac.authorization.k8s.io

With this configuration, members of the data-science-alpha group can manage jobs in their namespace but cannot see or affect any resources in the team-beta namespace.

Managing Resource Consumption with Quotas

Namespaces provide logical separation, but they do not prevent one team from consuming all available physical resources. A single user in team-alpha could inadvertently request all available GPUs, starving team-beta and other users. This is where ResourceQuota objects are essential.

A ResourceQuota sets hard limits on the total amount of compute and storage resources that can be consumed by all pods within a namespace. For ML platforms, the most important resources to constrain are CPU, memory, and specialized hardware like GPUs.

Here is an example ResourceQuota for a development namespace, capping its total consumption:

# quota-team-alpha.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "20"           # Max 20 CPU cores requested in total
    requests.memory: "100Gi"     # Max 100 GiB of RAM requested in total
    limits.cpu: "40"             # Max 40 CPU cores as hard limits
    limits.memory: "200Gi"       # Max 200 GiB of RAM as hard limits
    requests.nvidia.com/gpu: "4" # Max 4 GPUs requested in total
    pods: "10"                   # Max 10 pods running at once

When a user in team-alpha tries to create a pod that would exceed these aggregate limits, the Kubernetes API server will reject the request. This ensures that no single namespace can monopolize the cluster's resources, guaranteeing a fair share for all tenants.

A related object, LimitRange, can be used to enforce resource requests and limits on individual pods within a namespace. This is a good practice to prevent users from creating pods without any resource declarations, which can make scheduling difficult and inefficient.

Prioritizing Workloads with Priority and Preemption

With quotas in place, you have fair sharing. But not all workloads are created equal. A final model training job for a production release is more important than an experimental data exploration notebook. When the cluster is at capacity, you need a way to ensure high-priority jobs can run, even if it means interrupting less important ones. This is accomplished with PriorityClass.

A PriorityClass is a cluster-scoped object that defines a named priority level. Pods can then reference a PriorityClass by name. The Kubernetes scheduler uses this information in two ways:

Scheduling Order: When multiple pods are waiting (pending), the scheduler will attempt to schedule the pod with the highest priority first.
Preemption: If a high-priority pod cannot be scheduled due to insufficient resources, the scheduler can evict (preempt) lower-priority pods from a node to make room.

Let's define a few priority levels for our ML platform:

# priorityclasses.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-prod
value: 1000000
globalDefault: false
description: "Used for critical production training and serving workloads."

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-research
value: 500000
globalDefault: false
description: "Used for important research experiments and validation runs."

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority-dev
value: 100000
globalDefault: true # Make this the default if none is specified
description: "Used for development, notebooks, and other non-urgent tasks."

Now, a user submitting a production training job can specify the critical-prod priority in their pod specification:

# prod-training-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: prod-training-job-1
  namespace: team-alpha
spec:
  priorityClassName: critical-prod # <-- Reference the PriorityClass
  containers:
  - name: training-container
    image: pytorch/pytorch:latest
    resources:
      requests:
        nvidia.com/gpu: 1
      limits:
        nvidia.com/gpu: 1

If the cluster has no available GPUs, the scheduler will identify a node running a pod with a lower priority (e.g., a low-priority-dev pod) and evict it. The preempted pod is gracefully terminated, allowing the critical-prod pod to be scheduled in its place.

A high-priority pod in the pending queue triggers the preemption of a running low-priority pod to free up necessary GPU resources.

By combining namespaces for isolation, resource quotas for fairness, and priority classes for preemption, you can transform a chaotic, contended cluster into an orderly and efficient multi-tenant ML platform. This structured approach not only improves resource utilization but also provides the predictability and control required for production machine learning operations.

Was this section helpful?

References

Kubernetes Concepts Documentation, Kubernetes Authors, 2020 - Official documentation for core Kubernetes concepts, including namespaces, resource quotas, priority classes, and RBAC, which are fundamental for multi-tenancy and resource management.
Kubernetes in Action, Second Edition, Marko Lukša, Kevin Conner, 2026 (Manning Publications) - A comprehensive book that explains Kubernetes architecture and its core features, offering practical explanations of namespaces, RBAC, resource quotas, and priority classes.
Practical MLOps: How to Get Models into Production and Keep Them There, Noah Gift and Alfredo Deza, 2021 (O'Reilly Media) - Provides best practices for deploying and managing machine learning systems using Kubernetes, addressing resource management and multi-tenancy in an MLOps context.