Fetching Data with Retrievers

Querying a vector store containing document embeddings is a primary method for finding information relevant to a user's question. While vector stores offer methods for direct querying, LangChain provides a more standardized and versatile interface for this purpose: the Retriever.

A retriever is an object that implements a common interface for fetching documents based on a query. Its primary function is to accept a string query and return a list of Document objects. This abstraction is significant because it decouples the logic of your application from the specific data source. A retriever might fetch data from a vector store, a traditional SQL database, or even a web API, but your application code interacts with it in the same way.

The retriever acts as a generic interface between the application's logic and various underlying data sources.

From Vector Store to Retriever

The most common way to instantiate a retriever is directly from an existing vector store object. This process is straightforward, using the as_retriever() method. Let's assume you have an initialized vectorstore from the previous section.

# Assuming 'vectorstore' is an initialized Chroma or FAISS vector store object
retriever = vectorstore.as_retriever()

# The retriever is now ready to be used
query = "What is the architecture of a RAG system?"
relevant_docs = retriever.invoke(query)

# Print the content of the first retrieved document
print(relevant_docs[0].page_content)

Executing vectorstore.as_retriever() creates a VectorStoreRetriever instance. By default, this retriever is configured to perform a similarity search, finding the document vectors that are closest to the query vector in the embedding space.

Customizing Retrieval Behavior

The default similarity search is effective, but there are situations where you may need more control over the retrieval process. The as_retriever() method allows for customization through its arguments, primarily search_type and search_kwargs.

Controlling the Number of Documents

A frequent adjustment is changing the number of documents to retrieve. This is managed with the k parameter inside search_kwargs. The value of k determines how many documents are passed as context to the LLM. A smaller k is faster and uses fewer tokens, but may miss important context. A larger k provides more context but increases cost and can introduce noise.

# Create a retriever that fetches the top 5 most relevant documents
retriever_k5 = vectorstore.as_retriever(
    search_kwargs={"k": 5}
)

relevant_docs_k5 = retriever_k5.invoke(query)
print(f"Retrieved {len(relevant_docs_k5)} documents.")
# Expected output: Retrieved 5 documents.

Improving Diversity with Maximal Marginal Relevance (MMR)

Sometimes, the top k documents are very similar to each other, offering redundant information. To get a more diverse set of results, you can use the Maximal Marginal Relevance (MMR) search type. MMR works by first selecting the document most similar to the query and then iteratively selecting subsequent documents that represent the best combination of similarity to the query and dissimilarity to the documents already selected. This helps provide a broader perspective on the topic.

# Create a retriever that uses MMR to select documents
retriever_mmr = vectorstore.as_retriever(
    search_type="mmr"
)

relevant_docs_mmr = retriever_mmr.invoke(query)

Using mmr can be particularly effective for complex queries where different documents might cover different aspects of the answer.

Filtering by Similarity Score

Another useful technique is to filter results based on a relevance threshold. This ensures that you only retrieve documents that meet a minimum similarity score, which is helpful for filtering out irrelevant results, especially if the user's query is on a topic not well-covered in your knowledge base.

For vector stores that support it (like Chroma), you can use the similarity_score_threshold search type.

# Create a retriever that only returns documents with a similarity score of 0.7 or higher
retriever_threshold = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.7}
)

relevant_docs_threshold = retriever_threshold.invoke(query)

This configuration prevents low-quality or unrelated documents from being passed to the LLM, which can improve the final answer's accuracy.

With a functional retriever configured to fetch the most relevant and diverse context for a given query, we now have all the components needed for our RAG system. The next section will show you how to combine this retriever with an LLM and a prompt template to build a complete question-answering chain.

Build LLM apps faster with Kerb

Cleaner syntax. Built-in debugging. Production-ready from day one.

Built for the AI systems behind ApX Machine Learning

Was this section helpful?

References

LangChain Conceptual Guide: Retrievers, LangChain, 2024 (LangChain) - Offers practical guidance on implementing and customizing retrievers within the LangChain framework.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020 Advances in Neural Information Processing Systems (NeurIPS) DOI: 10.48550/arXiv.2005.11401 - This paper introduces the Retrieval-Augmented Generation (RAG) framework, which provides the architectural foundation for combining retrieval with language models.
Using MMR for Diversity and Novelty in Information Retrieval, Jaime Carbonell, Jade Goldstein, 1998 SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM Press) DOI: 10.1145/290941.291025 - Introduces Maximal Marginal Relevance (MMR), a strategy for selecting documents that are both relevant to a query and diverse from each other, which is relevant to customizing retriever behavior.