Machine Learning Engineering, Andriy Burkov, Xavier Bouthillier, Gabriele Cesa, Viktor Kerkez, Egor Mineev, Adam Proie, Karim Sayed, Saishruthi Swaminathan, Hubert Szücs, Sharan Narang, 2022 (O'Reilly Media) - This book offers a comprehensive guide to deploying and maintaining machine learning models, including chapters on monitoring, data drift, and operational metrics in production environments.
Holistic Evaluation of Language Models, Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda, 2023Transactions on Machine Learning Research (TMLR), Vol. 162DOI: 10.48550/arXiv.2211.09110 - This paper introduces the HELM framework, a systematic evaluation benchmark for language models, covering accuracy, fairness, robustness, and efficiency, which informs continuous monitoring of LLMs in production.