Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR.2018.00097 - A foundational paper introducing the principles of post-training quantization and quantization-aware training, detailing the role of activation calibration for static quantization.
Post Training Static Quantization, PyTorch Documentation, 2019 (PyTorch Foundation) - Provides practical guidance and API details for implementing static post-training quantization in PyTorch, including how to collect and use calibration data.