Text Generation Inference (TGI) Documentation, Hugging Face, 2024 - Official documentation for Hugging Face's production-ready LLM serving framework, detailing its features like continuous batching, quantization support, and deployment considerations.
NVIDIA TensorRT-LLM Documentation, NVIDIA Corporation, 2024 - Official documentation for NVIDIA's highly optimized library for LLM inference, explaining techniques such as in-flight batching (continuous batching) and paged KV cache implementation on NVIDIA GPUs.