Text Generation Inference, Hugging Face, 2024 (Hugging Face) - Official documentation for a popular open-source serving solution that implements continuous batching and other optimizations for LLM inference.
NVIDIA TensorRT-LLM Documentation, NVIDIA, 2024 (NVIDIA) - Provides official documentation for NVIDIA's high-performance inference library, which includes optimizations like continuous batching and efficient KV cache management.