Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017arXiv preprint arXiv:1707.06347DOI: 10.48550/arXiv.1707.06347 - Introduces the PPO algorithm, a widely used reinforcement learning method for policy optimization.
LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - Presents Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning technique that significantly reduces memory and computational requirements, especially for PPO with LLMs.
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Cong Xu, Yuxiong He, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Dean Macy, 2020SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Association for Computing Machinery (ACM))DOI: 10.1145/3429388.3444838 - Introduces ZeRO (Zero Redundancy Optimizer), a suite of memory optimization techniques for distributed training that help train models with billions to trillions of parameters.
Mixed Precision Training, Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1710.03740 - Describes mixed-precision training using FP16, a technique for reducing memory footprint and speeding up computation on compatible hardware.