Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, 2008 (Cambridge University Press) - A standard textbook covering fundamental information retrieval concepts and metrics like Precision, Recall, MRR, and nDCG.
AgentBench: Evaluating LLM Agents on Diverse Real-world Tasks, Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, Jie Tang, 2023arXiv preprint arXiv:2308.03688DOI: 10.48550/arXiv.2308.03688 - Proposes AgentBench, a benchmark for evaluating the performance of LLM agents across diverse real-world tasks, offering insights into end-to-end agent evaluation.