ZeRO: Memory Optimizations Toward Training Trillion-Parameter Models, Samyam Rajbhandari, Cong Guo, Jeff Rasley, Shaden Smith, Yuxiong He, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (ACM (Association for Computing Machinery))DOI: 10.1145/3418856.3418915 - Describes ZeRO, a memory optimization technique used in DeepSpeed for training large models, important for LLM scalability.
The Ray Project Documentation, Anyscale, Inc., 2024 - Official documentation for Ray, a distributed computing system used for LLM data processing (Ray Data), distributed training (Ray Train), and serving (Ray Serve).
LLMOps: Towards a Standardized, Streamlined, and Scalable Workflow for Large Language Models, Zhifeng Zhang, Bo Qiao, Mengjie Li, Guohao Li, Yuan Jiang, Chengyue Wu, Weiqi Wang, Long Cheng, Huaijun Li, Bo Dong, Xiaoyong Hu, 2023arXiv preprint arXiv:2306.05942 - A survey paper that outlines the lifecycle and considerations of LLMOps, offering a conceptual framework for tooling choices.