Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, 2017Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.1712.05877 - A foundational paper introducing post-training quantization (PTQ) techniques, including the concept of scale and zero-point for integer-only inference, widely adopted in frameworks like TensorFlow Lite.
Quantization for Deep Learning Models, PyTorch Documentation, 2019 (PyTorch Foundation) - The official PyTorch documentation provides comprehensive guides and API references for implementing both post-training quantization (PTQ) and quantization-aware training (QAT), matching the examples provided in the section.