Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2005.11401 - A seminal paper that introduced the Retrieval-Augmented Generation (RAG) paradigm, which integrates retrieval into the generation process to ground large language models with external knowledge.
Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, 2008 (Cambridge University Press) - A foundational textbook covering information retrieval principles, including keyword-based search and vector space models, which are relevant for building both traditional and semantic search components of a hybrid retrieval system.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Nils Reimers and Iryna Gurevych, 2019Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)DOI: 10.48550/arXiv.1908.10084 - Introduces a method for deriving semantically meaningful sentence embeddings that are highly effective for tasks like semantic search and information retrieval, a core component of the RAG pipeline.