Optimizing Latency and Cost of Deep Learning Inference in Serverless Functions, Ioannis Anagnostopoulos, Michail Gkagkas, Oana Balalau, Anne-Marie Kermarrec, and Konstantinos G. Stavropoulos, 2021Proceedings of the 3rd Workshop on Machine Learning and Systems (LearningSys '21) (ACM)DOI: 10.1145/3468791.3469147 - Examines methods to simultaneously reduce latency and manage costs for deep learning inference on serverless platforms, covering aspects like cold start and resource allocation.
Deploying machine learning models with GPU on Cloud Run, Google Cloud Documentation, 2024 (Google Cloud) - Official documentation providing practical guidance on configuring and deploying machine learning models that require GPU acceleration on Google Cloud's serverless platform, Cloud Run.