ONNX Runtime Documentation, Microsoft, 2024 (Microsoft) - Covers the core functionality, execution providers, and graph optimizations of ONNX Runtime.
Optimum: Exporting models to ONNX, Hugging Face, 2024 (Hugging Face) - Explains how to use the Optimum library for exporting Transformer models, including quantized ones, to ONNX format.
NVIDIA TensorRT Developer Guide, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Describes NVIDIA TensorRT, a platform for high-performance deep learning inference, relevant for the TensorRTExecutionProvider.