Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
QLoRA: Efficient Finetuning of Quantized LLMs on Consumer GPUs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - Introduces QLoRA and the NormalFloat (NF4) data type, a significant method for 4-bit quantization in large language models, enabling efficient fine-tuning on consumer hardware.
Quantization for Model Optimization, PyTorch Documentation, 2019 - Official PyTorch documentation providing practical guidance and APIs for implementing post-training quantization and quantization-aware training.