CUDA semantics, PyTorch Documentation Team, 2016 (PyTorch) - Official documentation on PyTorch's CUDA memory management functions, including memory_allocated and memory_reserved, directly referenced in the section.
NVIDIA System Management Interface (nvidia-smi), NVIDIA Corporation, 2024 (NVIDIA) - Official resource for the nvidia-smi command-line utility, a tool for monitoring GPU memory usage discussed in the section.
Quantization - Hugging Face Transformers, Hugging Face, 2024 (Hugging Face) - Provides practical guidance and implementations for various quantization techniques (including 8-bit and 4-bit) used in deploying LLMs, impacting disk and runtime memory.