Post-Training Quantization (PTQ) Algorithms for LLMs
Was this section helpful?
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, Elias Frantar, Saleh Ashkaneh, Jerry Zhao, Sabine M. Fathi, Shuo Yang, Siddarth Malreddy, Artem Gorevoy, Daniel Adiwardana, Jonathan Herzig, Daniel N. Gillman, Oleg Rybakov, Adam Roberts, David R. So, Shivani Agrawal, Sharan Narang, Michael S. Duke, William J. Dally, Hattie Zhou, James Bradbury, Matthew Buddy, Brian Catanzaro, Michael G. Mozer, Somasekhar Vemuri, Wojciech Zaremba, Alon Halevy, Robert Schapire, 2022arXiv preprint arXiv:2210.01730DOI: https://doi.org/10.48550/arXiv.2210.01730 - Introduces the GPTQ algorithm, a layer-wise, error-compensated post-training quantization method for LLMs, targeting low-bit precision with minimal accuracy degradation.
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han, 2023arXiv preprint arXiv:2306.00978DOI: 10.48550/arXiv.2306.00978 - Presents AWQ, a post-training quantization method that scales weights based on activation magnitudes to protect important weights, offering a fast and accurate solution for LLMs.