Mixed-Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning RepresentationsDOI: 10.48550/arXiv.1710.03740 - Introduces mixed-precision training and key techniques like loss scaling for FP16.
Training with BFloat16 on NVIDIA GPUs, Nikolaos Markidis, Andrew P. Overman, Michael Garland, and Jan-Dirk Wegner, 2020 (NVIDIA) - Describes BF16's design, advantages over FP16, and implementation on NVIDIA GPUs.
Automatic Mixed Precision package - torch.cuda.amp, PyTorch Documentation, 2024 (PyTorch Foundation) - Official guide for implementing mixed-precision training in PyTorch using torch.cuda.amp and GradScaler.
Mixed precision training, TensorFlow Documentation, 2024 (Google) - Official guide for implementing mixed-precision training in TensorFlow using tf.keras.mixed_precision.