Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Deep Reinforcement Learning from Human Preferences, Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei, 2017arXiv preprint arXiv:1706.03741DOI: 10.48550/arXiv.1706.03741 - This paper presents an early approach to training a reward model from human feedback for deep reinforcement learning agents, laying the groundwork for preference modeling in various RL applications, including LLMs.