软件与硬件环境

这部分内容有帮助吗？

参考文献

DeepSpeed: System Optimizations for Large-Scale Model Training, Samyam Rajbhandari, Cong Li, Zhun Liu, Guangxuan Xiao, Andreas Santarosa, Tiyab Sattar, Sheng Shen, Mao Ye, and Yuxiong He, 2020 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI '20) (USENIX) DOI: 10.5555/3446002.3446019 - 介绍了DeepSpeed库及其用于训练大型模型的ZeRO内存优化技术。
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro, 2019 arXiv preprint arXiv:1909.08053 DOI: 10.48550/arXiv.1909.08053 - 描述了Megatron-LM中实现的用于超大型语言模型的模型和流水线并行策略。
Fully Sharded Data Parallel (FSDP), PyTorch Documentation, 2022 - PyTorch原生FSDP实现的官方文档，解释了其在大规模分布式训练中的应用。
NVIDIA H100 GPU Architecture In-Depth, NVIDIA, 2022 (NVIDIA Technical Whitepaper) - NVIDIA Hopper H100 GPU的架构技术白皮书，涵盖Tensor Cores和NVLink互连技术。
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - 一本综合性教科书，提供深度学习原理和算法的基础知识。