ZeRO: Memory Optimizations Toward Training Trillion-Parameter Models, Samyam Rajbhandari, Cong Guo, Jeff Rasley, Shaden Smith, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM (Association for Computing Machinery))DOI: 10.1145/3418856.3418915 - 描述了ZeRO,这是DeepSpeed中用于训练大型模型的一项内存优化技术,对LLM的可伸缩性有帮助。
The Ray Project Documentation, Anyscale, Inc., 2024 - Ray的官方文档,Ray是一个分布式计算系统,用于LLM数据处理(Ray Data)、分布式训练(Ray Train)和服务(Ray Serve)。