Hands-on Practical: Distributed Training with PyTorch FSDP
Was this section helpful?
Fully Sharded Data Parallel (FSDP), PyTorch Team, 2024 - Official documentation for PyTorch's FSDP, detailing its API, usage, and configuration options for distributed training.
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Cong Guo, Jeff Rasley, Shaden Smith, Yuxiong He, 2021SC21: International Conference for High Performance Computing, Networking, Storage and Analysis (ACM)DOI: 10.1145/3479069.3487532 - This academic paper introduces the ZeRO optimization stages (ZeRO-2 and ZeRO-3) for memory-efficient large model training, which FSDP's sharding strategies are analogous to.
Distributed communication package - torch.distributed, PyTorch Team, 2024 (PyTorch) - The official PyTorch documentation for its distributed communication primitives, including init_process_group, torchrun, and environment variables.
Distributed training with 🤗 Accelerate, Hugging Face, 2024 (Hugging Face) - Practical guide from Hugging Face on using Accelerate for simplifying distributed training with transformer models, including FSDP integration.