vLLM Documentation, vLLM Developers, 2024 - Official documentation providing comprehensive guides, API references, and practical examples for using vLLM, including support for quantized models.
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han, 2023MLSys 2024DOI: 10.48550/arXiv.2306.00978 - Presents Activation-aware Weight Quantization (AWQ), a post-training quantization method specifically designed for large language models to reduce memory and accelerate inference.