So far, we have worked with Large Language Models that operate on their internal, pre-trained knowledge. This approach has a clear limitation: the model is unaware of any data created after its training cut-off date and has no access to private or domain-specific information. This chapter introduces a method to overcome this by connecting an LLM to external data sources.
This method is known as Retrieval Augmented Generation (RAG). The core idea is to first retrieve relevant documents from an external knowledge base and then provide them to the LLM as context for generating a response. This grounds the model's output in factual, timely, and specific information.
Throughout this chapter, we will construct a RAG system piece by piece. You will learn how to:
DocumentLoaders to ingest data from various file formats like PDFs and text files.TextSplitters.VectorStore to index these embeddings for efficient similarity searches, where relevance is determined by the distance between a query vector vq and document vectors vd.5.1 Architecture of a RAG System
5.2 Loading Data with Document Loaders
5.3 Splitting Documents for Processing
5.4 Vector Stores and Embeddings
5.5 Fetching Data with Retrievers
5.6 Building a Question-Answering Chain
5.7 Hands-on Practical: Q&A over Your Documents
© 2026 ApX Machine LearningEngineered with