Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, 2008 (Cambridge University Press) - A foundational textbook covering information retrieval theory, including comprehensive details on ranking evaluation metrics.
Cumulated gain-based evaluation of IR techniques, Kalervo Järvelin and Jaana Kekäläinen, 2002ACM Transactions on Information Systems (ACM TOIS), Vol. 20 (ACM)DOI: 10.1145/582415.582418 - Introduces and details the Normalized Discounted Cumulative Gain (NDCG) metric, a standard for evaluating ranked retrieval systems with graded relevance.
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang, 2016Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Conversational AIDOI: 10.48550/arXiv.1611.09268 - Describes the MS MARCO dataset, a large-scale collection widely used for training and evaluating information retrieval and reading comprehension systems, including relevance judgments.
Neural Information Retrieval: A Review, Sean MacAvaney, Daniel Cohen, Nazir Nayal, and Andrew Yates, 2020Foundations and Trends® in Information Retrieval, Vol. 14 (now publishers)DOI: 10.1561/1500000072 - Provides a comprehensive overview of modern neural information retrieval, covering evaluation practices and challenges specific to these systems.