NVIDIA Triton Inference Server Documentation, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official documentation for NVIDIA's open-source inference serving software, detailing its features like dynamic batching, concurrent model execution, and extensibility.
NVIDIA TensorRT-LLM Documentation, NVIDIA Corporation, 2025 (NVIDIA) - Official documentation for NVIDIA's library for optimizing Large Language Model inference, covering kernel optimizations, quantization, and in-flight batching.