Big Model Inference, Hugging Face, 2024 (Hugging Face) - Official documentation for Hugging Face Accelerate, explaining practical methods for offloading large model parameters to CPU and disk to enable inference on memory-constrained hardware.
GPUDirect Storage, NVIDIA, 2024 (NVIDIA) - NVIDIA's technical overview of GPUDirect Storage, explaining how it enables a direct data path between NVMe storage and GPU memory, enhancing data transfer speeds for offloading.