Quantization for Deep Learning Models, PyTorch Documentation, 2019 (PyTorch Foundation) - Provides practical guidance and implementation details for various quantization types, including static and dynamic quantization, within the PyTorch framework.
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR.2018.00286 - A foundational paper that introduces methods for quantizing neural networks to enable efficient integer-only inference, laying the technical groundwork for static quantization schemes.
Model optimization overview | TensorFlow Lite, TensorFlow Lite Documentation, 2024 (Google) - Offers an overview of different model optimization techniques, including static and dynamic quantization, as applied within the TensorFlow Lite ecosystem.