Constitutional AI: Harmlessness from AI Feedback, Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Anna Chen, Andy Jones, Anna Goldieberger, Aziza Mirrashed, Cameron McKinnon, Carol Chen, Catherine Olsson, Chris Conly, David Drain, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jasmine Sittler, Jennifer Glowinski, Jeremy Scheurer, Jessica Kerr, Josh Jacobson, Kristen Lee, Liane Lovitt, Lisa Wang, Michael Sellitto, Mo Mukherjee, Nicholas Joseph, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Shauna Gordon-McKeon, Simon Lefevre, Tristan Hume, Zac Hatfield-Dodds, Danny Hernandez, Daniela Amodei, Dario Amodei, Jack Clark, Sam McCandlish, Tom Brown, Jared Kaplan, 2022arXiv preprint arXiv:2209.07858DOI: 10.48550/arXiv.2209.07858 - 虽然主要关注安全性,但本文介绍了一种AI模型根据一组原则批评和修订自身输出的方法,展示了一种稳健的AI驱动的自我评估和纠正形式。