Utilizing Frameworks like DeepSpeed and Megatron-LM
Was this section helpful?
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020SC20: International Conference for High Performance Computing, Networking, Storage and AnalysisDOI: 10.1109/SC41405.2020.00072 - The foundational paper introducing the Zero Redundancy Optimizer (ZeRO) which is central to DeepSpeed's memory efficiency techniques.
DeepSpeed Documentation, Microsoft DeepSpeed Team, 2024 - Official documentation providing practical guides, API references, and configuration examples for DeepSpeed features, including ZeRO stages and offloading.
NVIDIA/Megatron-LM GitHub Repository, NVIDIA, 2024 - The official source code repository for Megatron-LM, offering implementation details, usage examples, and configuration patterns for advanced parallelism strategies.