A General Language Assistant as a Laboratory for Alignment, Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Jared Kaplan, 2021arXiv preprintDOI: 10.48550/arXiv.2112.00861 - Introduces the HHH (Helpful, Harmless, Honest) framework for LLM alignment, detailing the complexities and human judgment involved in defining and achieving it.