Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
NVIDIA Triton Inference Server Documentation, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official documentation for NVIDIA's open-source inference serving software, detailing its features like dynamic batching, concurrent model execution, and extensibility.
NVIDIA TensorRT-LLM Documentation, NVIDIA Corporation, 2025 (NVIDIA) - Official documentation for NVIDIA's library for optimizing Large Language Model inference, covering kernel optimizations, quantization, and in-flight batching.