Mixed-Precision Training of Deep Neural Networks, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.03740 - This paper introduced the techniques for mixed-precision training, including gradient scaling, which are foundational to torch.cuda.amp.
Floating Point and Mixed Precision, NVIDIA Corporation, 2024NVIDIA CUDA C++ Programming Guide (NVIDIA Corporation) - Explains the theoretical and hardware aspects of floating-point numbers, including the benefits and challenges of using mixed precision on NVIDIA GPUs with Tensor Cores.