Horizontal Pod Autoscaler, Kubernetes Documentation, 2025 - Official documentation explaining the core principles and configuration of Kubernetes Horizontal Pod Autoscaler, including custom and external metrics crucial for LLM serving.
KEDA Documentation, KEDA Community, 2024 - Official documentation for KEDA (Kubernetes Event-driven Autoscaling), detailing its capabilities for scaling based on various event sources and enabling scale-to-zero for cost optimization.
What is Amazon EC2 Auto Scaling?, Amazon Web Services, 2024 - Official documentation explaining the fundamental concepts of dynamic resource adjustment in a major cloud environment, applicable to managing GPU instances for LLM serving.
Designing Machine Learning Systems, Chip Huyen, 2022 (O'Reilly Media) - This book offers a comprehensive guide to building production-ready machine learning systems, including strategies for deploying, monitoring, and efficiently scaling models.