CUDA C++ Programming Guide, NVIDIA Corporation, 2023 (NVIDIA Corporation) - Explains GPU memory structure, including global memory and memory access methods, which are essential for understanding data transfer speed.
HBM3: The Next-Gen Memory Standard for AI and HPC, NVIDIA Corporation, 2022 (NVIDIA Corporation) - Describes High Bandwidth Memory (HBM) technology, its design, and why it is beneficial for high-performance computing tasks like AI and large language models.
Computer Architecture: A Quantitative Approach, John L. Hennessy, David A. Patterson, 2017 (Morgan Kaufmann) - Provides foundational concepts of computer architecture, including memory hierarchy, data transfer rate, and latency, which inform the understanding of GPU performance limits.