Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
DeepSpeed: System Optimizations for Large-Scale Model Training, Jie Ren, Hao Li, Samyam Rajbhandari, Conglong Li, Di He, Zhicheng Cui, Xuanli Chen, Junchao Li, Sholto Scruton, Minjia Zhang, 2021ACM SIGOPS Operating Systems Review, Vol. 55 (ACM)DOI: 10.1145/3452044.3483742 - Describes DeepSpeed, a comprehensive framework providing optimized distributed training capabilities, including various forms of model parallelism that complement tensor parallelism.
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE Computer Society)DOI: 10.1109/SC45903.2020.00078 - While focusing on memory optimization for optimizer states, gradients, and parameters, ZeRO is crucial for enabling the training of models large enough to necessitate tensor parallelism.