safetensors, Hugging Face, 2023 - Explains the design and usage of Safetensors, a format for safe and efficient serialization of large deep learning models.
NVIDIA CUDA Container Images, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official source for GPU-optimized Docker base images with CUDA and cuDNN, essential for high-performance LLM serving.
Dockerfile best practices, Docker Inc., 2024 (Docker Inc.) - Official guide to creating efficient, secure, and maintainable Docker images, covering multi-stage builds and layer caching.
MLOps Engineering at Scale, Carl Osipov, 2022 (O'Reilly Media) - Offers a comprehensive guide to building and deploying ML systems at scale, including discussions on model packaging, dependency management, and containerization.