Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Constitutional AI: Harmlessness from AI Feedback, Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, et al., 2022arXiv preprintDOI: 10.48550/arXiv.2212.08073 - Discusses methods for aligning LLMs with human values and safety principles using AI feedback, which is a method for training LLMs to moderate their own behavior without extensive human intervention, relevant to the LLM-as-a-judge technique.
Holistic Evaluation of Language Models, Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew F. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda, 2023Transactions on Machine Learning ResearchDOI: 10.48550/arXiv.2211.09110 - Provides a framework for evaluating LLMs across various criteria, including safety, fairness, and robustness, offering context for assessing the overall reliability and risks that guardrails address.
Google Cloud Perspective API Documentation, Google Cloud, 2024 (Google Cloud) - Official documentation for a widely used content moderation API that employs machine learning models to detect various categories of harmful or toxic content, serving as an example of model-based classifiers for output guardrails.