Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, 2008 (Cambridge University Press) - 全面介绍了信息检索核心指标,如准确率、召回率和MRR,这些指标对于评估RAG系统的检索组件至关重要。
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica, 2023NeurIPS 2023 Datasets and Benchmarks TrackDOI: 10.48550/arXiv.2306.05685 - 研究了使用大型语言模型作为生成任务自动评估器(LLM-as-a-judge)的有效性和局限性,该方法与评估RAG输出的忠实度和答案相关性直接相关。