Cloud FinOps: Collaborative, Real-Time Cloud Financial Management, J. R. R. Allen, R. B. H. Kirk, J. W. K. Walker, 2022 (O'Reilly Media) - Provides a comprehensive framework for cloud financial management, including principles directly applicable to optimizing AI infrastructure costs.
NVIDIA Nsight Systems User's Guide, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official documentation for a leading profiling tool critical for understanding and optimizing GPU-accelerated deep learning workloads.
Amazon SageMaker Endpoint Auto Scaling, Amazon Web Services (AWS), 2024 - Official documentation describing how to automatically adjust the number of inference instances based on traffic, a key component of right-sizing for variable inference workloads.