Mixed-Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.03740 - Explains the methods of using FP16 and FP32 together for training deep neural networks, reducing memory and increasing speed.
Precision for Deep Learning: From FP32 to INT8, Alexey Shcherbakov, 2021 (NVIDIA Developer Blog) - Provides a practical overview of different numerical precisions used in deep learning and their impact on performance and memory on NVIDIA GPUs.
Deep Learning (Chapter 4: Numerical Computation), Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides foundational knowledge on numerical computation, including floating-point arithmetic and its considerations in deep learning.