Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems, Vol. 33 (Curran Associates, Inc.)DOI: 10.55919/neurips.2020.00994 - This foundational paper introduces the Retrieval-Augmented Generation (RAG) paradigm, highlighting the importance of retrieving relevant information from a knowledge base to improve language model responses. It supports the need for well-prepared source data where contextual metadata can enhance retrieval.
Nodes, Documents, and Metadata, LlamaIndex Documentation, 2024 (LlamaIndex) - This conceptual guide from a popular RAG framework explains how text is structured into nodes (chunks) and how essential metadata is associated with these nodes for effective indexing and retrieval, providing practical insights into implementation.