LLM.int8(): 8-bit Matrix Multiplication for Large Language Models, Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer, 2022Advances in Neural Information Processing Systems (NeurIPS), Vol. 36DOI: 10.48550/arXiv.2208.07339 - Introduces a method for 8-bit quantization specifically designed for large language models, illustrating practical application of PTQ concepts to LLMs.