Mixed-Precision Training of Deep Neural Networks, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.03740 - 介绍了像损失缩放等用于稳定FP16训练的有效技术,为理解BF16更宽范围的价值提供了基础背景。