Quantization for Deep Learning: A Comprehensive Survey, Zhenghong Zhao, Jincheng Zheng, Yunxiao Li, Yanggang An, Fan Xu, Junqing Xia, and Shengli Zhang, 2021IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44 (IEEE)DOI: 10.1109/TPAMI.2021.3129532 - Provides an overview of various quantization techniques, including post-training quantization, calibration methods, and their impact on model performance.
Post-training quantization, TensorFlow, 2024 (Google AI) - Official documentation detailing the practical aspects of post-training quantization within a major ML framework, covering calibration, benefits, and limitations.
TVM: An Automatic End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy, 2018OSDI '18: Proceedings of the 13th USENIX Symposium on Operating Systems Design and ImplementationDOI: 10.1145/3294344.3259969 - Introduces an end-to-end optimizing compiler for deep learning, discussing its support for low-precision operations and graph optimizations relevant to PTQ.
Data-Free Quantization Through Weight Equalization and Bias Correction, Markus Nagel, Anna Alverio, Tal Hakim, Vaibhav Kumar, and Tijmen Blankevoort, 2019Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE)DOI: 10.1109/ICCV.2019.00936 - Presents techniques to improve the accuracy of post-training quantization without requiring additional training data, addressing a limitation of PTQ.