Accelerating Large Language Model Inference with NVIDIA FasterTransformer and Triton Inference Server, Apoorv Bansal, Bin Zhang, Bo Li, Chintan Shah, David W. Ho, Hanwen Chang, Jie Ren, Jike Li, Kuan Wang, Long Lu, Luyang Liu, Pranav Kashyap, Santhosh Tumma, Shuo Yang, Xiaodi Lu, Yichi Zhang, Yiran Shao, Yongmin Li, Zhongshuai Wang, and Zhen Jia, 2023NVIDIA Developer Blog (NVIDIA) - 解释了如何使用NVIDIA的FasterTransformer后端与Triton服务器进行优化的LLM服务。