Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, 2008 (Cambridge University Press) - 一本标准教科书,涵盖信息检索的基本概念和指标,包括准确率、召回率和排名指标。
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica, 2023NeurIPS 2023 Datasets and Benchmarks TrackDOI: 10.48550/arXiv.2306.05685 - 研究使用大型语言模型作为评估其他大型语言模型裁判的有效性和可靠性,与基于模型的评估相关。