Revisiting Quantization Principles for Large Models
Was this section helpful?
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, 2017arXiv preprint arXiv:1712.05877DOI: 10.48550/arXiv.1712.05877 - This foundational paper introduces the quantization scheme used in TensorFlow Lite, defining core concepts like symmetric and asymmetric quantization, per-tensor granularity, and calibration for post-training quantization. It is a resource for understanding basic principles.