Designing Machine Learning Systems: An Introduction to MLOps, Chip Huyen, 2022 (O'Reilly Media) - Provides a comprehensive overview of MLOps, including various deployment strategies like canary releases, blue/green deployments, and A/B testing, which are fundamental for safe LLM rollouts.
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned, Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark, 2022arXiv preprint arXiv:2209.07858DOI: 10.48550/arXiv.2209.07858 - Discusses systematic approaches to identify and mitigate safety risks in LLMs before deployment through red teaming, a critical pre-deployment step for ensuring safety.
The System Card: A Documenting Approach to Responsible AI Development, Saleema Amershi, Anna Roth, Lauren Huneke, Rachel K. E. Bellamy, Jennifer Wortman Vaughan, Hanna Wallach, Meredith Ringel Morris, 2023Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (ACM)DOI: 10.1145/3544548.3581177 - Provides a framework for documenting AI systems for responsible development, emphasizing continuous evaluation and monitoring throughout the lifecycle, including post-deployment vigilance.
Holistic Evaluation of Language Models, Rishi Bommasani, Percy Liang, Tony Lee, Kathleen K. Lee, Jason Portenoy, Asli Celikyilmaz, Yizhong Wang, Emily Alsentzer, Danqi Chen, David Liang, Tatsunori Hashimoto, Yilun Du, Kevin L. Jarrett, Karan Goel, Peter Henderson, Jean-Benoit P. Goulard, Steven Wang, Michael S. Bernstein, Matei Zaharia, Emma Brunskill, Yejin Choi, Christopher D. Manning, Jure Leskovec, Sanmi Koyejo, Chelsea Finn, Andrew Y. Ng, 2023Transactions on Machine Learning Research, Vol. 1 (MLOSS Foundation)DOI: 10.48550/arXiv.2211.09110 - Presents a comprehensive framework for evaluating LLMs across diverse scenarios and metrics, providing principles essential for designing pre-deployment safety evaluations and ongoing monitoring.