Masterclass
Training large language models pushes the boundaries of available hardware, primarily due to the substantial memory footprint and computational cost associated with standard 32-bit floating-point (FP32) operations. Mixed-precision training presents a method to alleviate these constraints. By strategically utilizing lower-precision numerical formats, specifically 16-bit floating-point (FP16) and bfloat16 (BF16), we can significantly reduce memory usage and often accelerate computation, particularly on hardware with specialized support like NVIDIA Tensor Cores.
This chapter details the practical application of mixed-precision training for LLMs. We will examine the characteristics of FP16 and BF16 formats, addressing the trade-offs compared to FP32. We will cover common stability challenges encountered when using lower precision, especially the limited range of FP16, and introduce techniques like loss scaling to maintain numerical stability during training. The role and benefits of the BF16 format will also be discussed. Finally, we will look at how to utilize built-in support for automatic mixed precision (AMP) found in major deep learning frameworks to simplify implementation.
20.1 Introduction to Floating-Point Formats (FP32, FP16, BF16)
20.2 Benefits of Lower Precision (Speed, Memory)
20.3 Challenges with FP16 Training (Range Issues)
20.4 Loss Scaling Techniques
20.5 Using BF16 (BFloat16) Format
20.6 Framework Support for Mixed Precision (AMP)
© 2025 ApX Machine Learning