While Post-Training Quantization (PTQ) offers a straightforward way to reduce model size and accelerate inference, the basic methods discussed earlier can sometimes lead to significant accuracy degradation, particularly when aiming for very low precision like 4-bit integers (INT4). Simple calibration might not capture enough information to preserve the model's performance effectively.
This chapter focuses on advanced PTQ techniques specifically developed to achieve better accuracy, often approaching the performance of the original full-precision model, without requiring retraining.
You will learn about:
By the end of this chapter, you will understand the principles behind these advanced techniques and be equipped to apply them for more effective LLM quantization.
3.1 Introduction to GPTQ
3.2 Understanding GPTQ Algorithm Mechanics
3.3 AWQ: Activation-aware Weight Quantization
3.4 SmoothQuant: Mitigating Activation Outliers
3.5 Comparing Advanced PTQ Methods
3.6 Implementation Considerations for Advanced PTQ
3.7 Hands-on Practical: Quantizing with GPTQ
© 2025 ApX Machine Learning