Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
ZeRO-Offload: Democratizing Billion-Scale Model Training, Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He, 2021arXiv preprint arXiv:2101.06840DOI: 10.48550/arXiv.2101.06840 - Describes a memory offloading technique for deep learning models, where model states are moved to CPU memory to free up GPU memory, with principles applicable to LLM inference.