Dynamic Batching in Triton Inference Server, NVIDIA Corporation, 2023 - Official documentation explaining the configuration and behavior of dynamic batching within a popular inference serving framework.
Clipper: A Low-Latency Online Prediction Serving System, Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica, 201714th USENIX Symposium on Networked Systems Design and Implementation (NSDI '17) (USENIX Association) - A foundational paper on an online prediction serving system that addresses challenges like latency and throughput through techniques including request aggregation.