NVIDIA H100 GPU Architecture In-Depth, NVIDIA Corporation, 2022 (NVIDIA) - Discusses the architecture of the H100 GPU, including detailed information on the 4th generation NVLink and its role in accelerating multi-GPU communication.
Computer Architecture: A Quantitative Approach, John L. Hennessy, David A. Patterson, 2017 (Elsevier) - A classic textbook providing fundamental insights into computer architecture, including explanations of I/O subsystems and interconnects like PCIe.
A Survey of Communication Bottlenecks in Distributed Deep Learning, Youngeun Kang, Jianzong Li, Kai Zeng, Dingwen Zeng, 2020ACM Computing Surveys, Vol. 53 (Association for Computing Machinery (ACM))DOI: 10.1145/3418544 - This survey identifies and analyzes various communication bottlenecks in distributed deep learning, including the role of GPU interconnects like PCIe and NVLink.