The preceding chapters introduced methods for processing documents and using embeddings to find semantically similar text. This chapter puts those components together to construct a complete question-answering system. A standard Large Language Model (LLM) generates responses based only on its pre-trained data. Retrieval-Augmented Generation, or RAG, extends this capability by providing the model with external, up-to-date information at the time of generation.
You will start by examining the full architecture of a RAG system, from data ingestion to final response generation. Following that, you will assemble your first end-to-end pipeline using the retrieval module. The chapter also addresses methods for improving retrieval accuracy, such as implementing different search strategies and applying re-ranking models. Finally, you will learn how to properly format and manage the retrieved context before it is passed to the generation model.
By the end of this chapter, you will have a practical understanding of how to build and refine RAG systems for answering questions over your own documents.
6.1 Anatomy of a RAG System
6.2 Creating a Simple Retrieval Pipeline
6.3 Implementing Different Search Methods
6.4 Improving Relevance with Re-ranking
6.5 Managing Retrieved Context for Generation
© 2026 ApX Machine LearningEngineered with