While Post-Training Quantization (PTQ) offers a straightforward way to quantize models after they have been trained, it can sometimes lead to a noticeable drop in accuracy, especially when moving to very low precision like 4-bit integers. When preserving accuracy is a primary concern and the results from PTQ are insufficient, Quantization-Aware Training (QAT) presents a different approach.
QAT integrates the quantization process directly into the model training or fine-tuning phase. By simulating the effects of lower precision arithmetic during training, the model learns to adapt its weights to minimize the accuracy loss caused by quantization. This often yields higher accuracy for the final quantized model compared to applying PTQ to the same original model, particularly for aggressive quantization targets.
In this chapter, you will learn the fundamentals of QAT:
Completing this chapter will equip you with the knowledge to determine when QAT is appropriate and how to implement it to produce more accurate low-precision models.
4.1 Need for Quantization-Aware Training
4.2 Simulating Quantization Effects During Training
4.3 Straight-Through Estimator (STE)
4.4 Implementing QAT with Deep Learning Frameworks
4.5 Fine-tuning Models with Quantization Nodes
4.6 Benefits and Drawbacks of QAT vs. PTQ
4.7 Practical Considerations for QAT Execution
4.8 Hands-on Practical: Setting up a Simple QAT Run
© 2025 ApX Machine Learning