ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Cong Xu, Yuxiong He, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Dean Macy, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Association for Computing Machinery (ACM))DOI: 10.1145/3429388.3444838 - 介绍了ZeRO(零冗余优化器),一套用于分布式训练的内存优化技术,可支持训练具有数十亿到数万亿参数的模型。
Mixed Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.03740 - 描述了使用FP16的混合精度训练,这是一种减少内存占用并加速兼容硬件上计算的技术。