Mixed-Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.03740 - This foundational paper introduced mixed-precision training with FP16 and the concept of loss scaling (static and dynamic) to mitigate underflow.
Automatic Mixed Precision package - torch.cuda.amp, PyTorch, 2025 (PyTorch) - Official documentation for PyTorch's Automatic Mixed Precision (AMP) package, including GradScaler for loss scaling, with usage examples.