混合并行策略 (DP+TP, DP+PP等)

全新 · 开源

用于构建生产级 LLM 应用的 Python 工具包。提供提示词、RAG、智能体、结构化输出和多提供商支持等模块化实用工具。

这部分内容有帮助吗？

参考文献

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis DOI: 10.48550/arXiv.1910.02054 - 引入了ZeRO优化器，一种面向大规模模型的内存高效数据并行方法。
DeepSpeed Features: Model Parallelism, Microsoft DeepSpeed Team, 2023 (Microsoft DeepSpeed Team) - 提供了ZeRO、流水线并行以及DeepSpeed-Megatron-LM集成实现混合并行策略的概述。
Efficient Large-Scale Language Model Training: A Case Study with Megatron-LM, NVIDIA Developer, 2021 NVIDIA Developer Blog (NVIDIA) - NVIDIA提供的一份实用指南，演示了如何使用Megatron-LM的并行策略训练大型模型。