GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, Elias Frantar, Saleh Ashkaneh, Jerry Zhao, Sabine M. Fathi, Shuo Yang, Siddarth Malreddy, Artem Gorevoy, Daniel Adiwardana, Jonathan Herzig, Daniel N. Gillman, Oleg Rybakov, Adam Roberts, David R. So, Shivani Agrawal, Sharan Narang, Michael S. Duke, William J. Dally, Hattie Zhou, James Bradbury, Matthew Buddy, Brian Catanzaro, Michael G. Mozer, Somasekhar Vemuri, Wojciech Zaremba, Alon Halevy, Robert Schapire, 2022arXiv preprint arXiv:2210.01730DOI: https://doi.org/10.48550/arXiv.2210.01730 - 介绍了GPTQ算法,这是一种针对LLM的分层、误差补偿型训练后量化方法,旨在以最小的精度损失实现低比特量化。