AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han, 2023Proceedings of the 40th International Conference on Machine Learning (ICML), Vol. 202DOI: 10.48550/arXiv.2306.00978 - Proposes a quantization method that leverages activation statistics to preserve model performance, offering a robust approach for efficient LLM deployment.
Measuring Massive Multitask Language Understanding, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2009.03300 - Presents a comprehensive benchmark for evaluating the knowledge and reasoning abilities of language models across a wide range of subjects, essential for assessing model quality.