Semantics derived automatically from language corpora contain human-like biases, Aylin Caliskan, Joanna J Bryson, Arvind Narayanan, 2017Science, Vol. 356 (American Association for the Advancement of Science)DOI: 10.1126/science.aal4230 - Introduces the Word Embedding Association Test (WEAT) and demonstrates how widely used word embeddings exhibit human-like biases, providing a foundational method for quantitative bias detection mentioned in the text.
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell, 2021Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery)DOI: 10.1145/3442188.3445922 - A seminal paper discussing the ethical and societal risks of large language models, including perpetuation of biases and potential for harmful outputs, emphasizing the need for monitoring.
Holistic Evaluation of Language Models, Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew T. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda, 2023Transactions on Machine Learning ResearchDOI: 10.48550/arXiv.2211.09110 - Introduces HELM, a comprehensive framework for evaluating language models across multiple criteria, including fairness and toxicity, providing methodologies relevant for continuous monitoring in production.