NVIDIA TensorRT-LLM Documentation, NVIDIA, 2024 (NVIDIA) - The official resource for TensorRT-LLM, detailing its architecture, optimization techniques, and support for quantized models on NVIDIA GPUs.
Text Generation Inference (TGI) Documentation, Hugging Face, 2024 (Hugging Face) - Official guide for deploying LLMs with Text Generation Inference, including integration with various quantization methods supported by Hugging Face.
ONNX Runtime Documentation, Microsoft, 2024 (Microsoft) - Comprehensive documentation for ONNX Runtime, outlining its cross-platform capabilities, execution providers, and support for ONNX-quantized models.