Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR.2018.00216 - This paper introduced Quantization-Aware Training (QAT), a technique discussed in the section, providing a foundational understanding of how models can be trained to be robust to quantization.
Q-Diffusion: Quantizing Diffusion Models for Efficient Generation, Yefei He, Hanyu Wang, Xiangyu Sun, Jianxing Xu, Qingyi Gu, Yang Liu, Zhaodong Wang, Zhangyang Wang, Kaiyuan Guo, and Wenshuo Li, 2023Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Institute of Electrical and Electronics Engineers (IEEE))DOI: 10.1109/CVPR52688.2023.00762 - This paper directly addresses the challenges and proposes solutions for quantizing diffusion models, offering insights into practical implementation and strategies for maintaining generative quality.
Outlier-Aware Quantization for Diffusion Models, Qingyi Gu, Yefei He, Fan Yang, Yihua Ye, Jianxing Xu, Zhangyang Wang, Kaiyuan Guo, Wenshuo Li, 2023Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR52688.2023.00761 - This work specifically tackles the issue of dynamic ranges and outliers in diffusion model activations, a major challenge highlighted in the section, providing advanced techniques for more effective quantization.