Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Cong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM)DOI: 10.1145/3418579.3441319 - Describes the ZeRO concept and its three stages, explaining the memory-saving mechanisms.
DeepSpeed ZeRO-powered Data Parallelism, DeepSpeed Team, 2024 - Official resource for configuring and using ZeRO optimizations within the DeepSpeed framework.
DeepSpeed: Extreme-Scale Model Training for Everyone, Jeff Rasley, Samyam Rajbhandari, Kazem Cheshmi, Chris Ping, Yuxiong He, 2020KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Association for Computing Machinery (ACM))DOI: 10.1145/3394486.3403154 - Presents the DeepSpeed framework, including ZeRO as a core component, and its capabilities for training large-scale models.