CUDA C++ Programming Guide, NVIDIA Corporation, 2023 (NVIDIA Corporation) - The official guide for understanding CUDA architecture, programming model, and core concepts for GPU acceleration.
NVIDIA cuDNN Developer Guide, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Provides details on the highly optimized library for deep neural network primitives on NVIDIA GPUs.
Programming Massively Parallel Processors: A Hands-on Approach, David B. Kirk, Wen-mei W. Hwu, 2016 (Morgan Kaufmann) - A foundational textbook explaining GPU architecture and parallel programming principles, useful for understanding the CPU-GPU contrast.
Tensor Core Programmability for Deep Learning, Mark Fowers, Sudarshan Gopalakrishnan, Joshua L. Romero, Stephen W. Keckler, Michael B. O'Connor, John D. Owens, 2020IEEE Micro, Vol. 40 (IEEE)DOI: 10.1109/MM.2020.2974377 - An academic paper detailing the architecture and operation of NVIDIA's Tensor Cores for accelerating deep learning computations.