While the previous chapter focused on the mechanics of a single distributed training job, production systems must manage many such jobs concurrently and efficiently. Operating directly on cloud virtual machines presents challenges in scheduling, fault tolerance, and resource utilization. Container orchestration provides the necessary layer of abstraction to solve these problems.
This chapter details the use of Kubernetes for orchestrating large-scale ML workloads. You will configure production-grade features for managing the entire lifecycle of models on a shared compute cluster. We begin by using KubeFlow to define and automate ML pipelines. Next, you will learn to manage specialized hardware through advanced GPU scheduling, including time-slicing and Multi-Instance GPU (MIG) configurations. We will then address operational efficiency by setting up cluster autoscaling to match compute supply with workload demand and by building strategies for using low-cost spot instances. The chapter concludes with methods for implementing multi-tenancy, enabling multiple teams to share infrastructure securely and with defined resource boundaries.
3.1 Managing ML Workflows with KubeFlow Pipelines
3.2 Advanced GPU Scheduling and Sharing
3.3 Cluster Autoscaling for Dynamic ML Workloads
3.4 Strategies for Using Spot and Preemptible Instances
3.5 Multi-Tenancy with Namespaces, Quotas, and Priority Classes
3.6 Practice: Configure a GPU-Aware Autoscaling Group
© 2026 ApX Machine LearningEngineered with