Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Quantization and Training of Neural Networks for Efficient Inference, Benoit Jacob, Skirmantas Kligys, Shengkuan Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, Vivienne Sze, 20182018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR.2018.00696 - Introduces a widely adopted post-training quantization method for 8-bit integers, providing a basis for many practical implementations.
Quantization for PyTorch Models, PyTorch Documentation, 2024 (PyTorch) - Official documentation explaining how quantization is implemented in a popular deep learning framework, including support for various integer types.