Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems, Vol. 33 (Advances in Neural Information Processing Systems)DOI: 10.48550/arXiv.2005.11401 - Introduces Retrieval-Augmented Generation (RAG) and outlines its fundamental architecture, including the preparation of knowledge sources.
Understanding tokens and context windows, OpenAI, 2025 (OpenAI) - Provides a clear explanation of how tokens work, the concept of context windows, and their importance in Large Language Models.
A Survey on Retrieval-Augmented Generation, Yunfan Shao, Zhicheng Dou, Jiantao Ji, Xiaoxue Li, et al., 2024arXiv preprint arXiv:2401.12192 (arXiv)DOI: 10.48550/arXiv.2401.12192 - A recent survey offering a comprehensive review of Retrieval-Augmented Generation, with sections dedicated to data preparation techniques like text chunking.
Text Splitters Overview, LangChain (LangChain) - Describes various text splitting strategies used in RAG systems, illustrating methods beyond simple fixed-size chunks to preserve context.