Quantization with Hugging Face Transformers, Hugging Face, 2024 (Hugging Face) - Official guide within the Hugging Face Transformers documentation, covering how to load and use quantized models, including those employing the GPTQ format.
ExLlamaV2 GitHub Repository, turboderp and ExLlamaV2 contributors, 2024 (turboderp-org) - Repository for ExLlamaV2, an optimized inference library specifically designed for fast and memory-efficient execution of GPTQ and similar weight-quantized models on NVIDIA GPUs.