Megatron-LM 介绍

这部分内容有帮助吗？

参考文献

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro, 2019 arXiv preprint arXiv:1909.08053 DOI: 10.48550/arXiv.1909.08053 - 介绍 Megatron-LM 框架的基础论文，详细阐述了其针对大型语言模型的张量并行和流水线并行实现。
NVIDIA Megatron-LM GitHub Repository, NVIDIA, 2024 - Megatron-LM 官方源代码和实际示例，用于实现模型并行。
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020 SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM) DOI: 10.1145/3416909.3417006 - 介绍了 ZeRO，一种内存优化策略，常与 Megatron-LM 结合使用，以实现超大型模型更高效的数据并行训练。