Mixed-Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018ICLR 2018DOI: 10.48550/arXiv.1710.03740 - Introduces methods for training deep neural networks using mixed precision, significantly improving training speed and reducing memory footprint on GPUs with Tensor Cores.
CUDA C++ Programming Guide, NVIDIA Corporation, 2024 (NVIDIA) - The official guide for programming NVIDIA GPUs with CUDA, covering the architecture, programming model, and API.
NVIDIA Ampere Architecture In-Depth, Ronny Krashinsky, Olivier Giroux, Stephen Jones, Nick Stam, Sridhar Ramaswamy, 2020NVIDIA Technical Blog (NVIDIA) - Detailed technical explanation of the NVIDIA Ampere GPU architecture, highlighting features like Tensor Cores and improved memory bandwidth for AI workloads.