ZeRO: Memory Optimizations Toward Training Trillion-Parameter Models, Samyam Rajbhandari, Cong Guo, Erin Grant, Zhun Liu, Hao Ma, Aakanksha Chowdhery, Yuxiong He, 2020Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Vol. 33 (NeurIPS)DOI: 10.5591/978-1-7138-0457-3.NeurIPS.2020.00762 - This paper introduces ZeRO, a memory optimization technique that partitions optimizer states, gradients, and parameters across devices to enable training of large deep learning models efficiently.
A Cloud-Scale Architecture for Accelerating AI, Norman P. Jouppi, Zhifeng Chen, David Pattison, Carla J. del Rosario, Nils Hogstrom, David A. Huffman, Daniel F. Kramer, Andrew B. New, Christopher B. Perry, Evan E. Radecki, Timothy B. Smith, Ryan M. Some, David R. Turnbull, Andrew J. Veit, John W. Weigelt, Dean M. Wilkes, Cliff C. Young, Yifeng Chao, Andrew L. Chien, Patrick K. H. Chiang, John A. Gunnels, Mark R. O'Connor, Anant N. Agarwal, Jeffrey Dean, Paul N. Hilfinger, Jeffrey S. Riegel, Edward A. Sacks, Manjot Singh, Stephen W. Smith, Jonathan S. Taylor, David L. Wells, Kenneth C. Yocum, 2021Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA '21) (ACM)DOI: 10.1145/3465033.3467614 - Presents the architecture of Google's Tensor Processing Units (TPUs) designed for cloud-scale AI workloads, detailing their specialized interconnect and processing capabilities.
NVIDIA H100 Tensor Core GPU Architecture, NVIDIA Corporation, 2022 (NVIDIA) - An official whitepaper detailing the NVIDIA H100 GPU architecture, including innovations like the Transformer Engine and NVLink, which are relevant for large language model computation.