The effectiveness of a Retrieval-Augmented Generation (RAG) system depends heavily on the quality of its retrieval step. Simply finding documents that contain the user's exact keywords is often not enough. To build a system, you need to implement search methods that can understand the user's intent and find the most relevant information, whether the match is based on exact terms or underlying meaning. The retrieval module provides several search methods to handle different scenarios.
Keyword search, also known as sparse retrieval, is the most traditional form of search. It works by matching the terms in your query with the terms in your document collection. This method excels at finding documents that contain specific, known identifiers, technical terms, or exact phrases. It's fast and doesn't require the computational overhead of generating embeddings.
The keyword_search function implements a BM25-like algorithm, which scores documents based on term frequency and inverse document frequency. This means it prioritizes documents where query terms appear frequently, while also giving more weight to rarer, more specific terms.
Consider a scenario where you are searching through a collection of technical articles.
from kerb.retrieval import Document, keyword_search
documents = [
Document(
id="doc1",
content="Asynchronous programming in Python allows for concurrent execution using async/await."
),
Document(
id="doc2",
content="JavaScript is an event-driven language that also supports asynchronous operations."
),
Document(
id="doc3",
content="REST APIs provide a standardized way for applications to communicate over HTTP."
)
]
query = "python async"
results = keyword_search(query, documents, top_k=2)
print("Keyword Search Results:")
for result in results:
print(f" Rank {result.rank}: {result.document.id} (Score: {result.score:.3f})")
In this case, doc1 would rank highest because it contains both "python" and "async". Keyword search is highly effective for queries with unique or technical terms, such as searching for a specific function name, an error message, or an industry-specific acronym.
However, its main limitation is its inability to understand synonyms or intent. A query for "making my app handle many users at once" would likely fail to find the document about "concurrent execution" because the vocabulary does not overlap. This is where semantic search becomes important.
Semantic search, or dense retrieval, moves from keyword matching to find documents that are similar in meaning to a query. It achieves this by representing both the query and the documents as numerical vectors called embeddings. The search then becomes a mathematical problem of finding the document vectors that are closest to the query vector in a high-dimensional space.
This approach is powerful because it can understand synonyms, paraphrasing, and the underlying intent of a query. For example, it would correctly identify that "building scalable web services" is semantically similar to documents about "asynchronous programming" and "REST APIs", even if the exact keywords don't match.
To perform a semantic search, you first need to generate embeddings for your documents and your query.
from kerb.retrieval import semantic_search
from kerb.embedding import embed, embed_batch
# Documents from the previous example
doc_texts = [doc.content for doc in documents]
doc_embeddings = embed_batch(doc_texts)
query = "building scalable web services"
query_embedding = embed(query)
results = semantic_search(
query_embedding=query_embedding,
documents=documents,
document_embeddings=doc_embeddings,
top_k=2
)
print("\nSemantic Search Results:")
for result in results:
print(f" Rank {result.rank}: {result.document.id} (Similarity: {result.score:.3f})")
Semantic search is ideal for natural language questions and broad topic exploration, where the user may not know the exact terminology used in the source documents. Its primary drawback is that it can sometimes miss documents that contain an exact but obscure keyword if the overall semantic meaning isn't a perfect match. Additionally, it requires the initial step of generating embeddings for all your documents.
To get the best of both worlds, you can implement hybrid search. This method combines the results of both keyword and semantic searches, leveraging the precision of term matching and the broader understanding of semantic similarity.
The hybrid_search function performs both searches and then fuses the results. You can control the influence of each search method by adjusting the keyword_weight and semantic_weight parameters. A higher keyword_weight favors exact matches, while a higher semantic_weight prioritizes meaning.
from kerb.retrieval import hybrid_search
query = "async python patterns"
query_embedding = embed(query)
# A balanced approach
results = hybrid_search(
query=query,
query_embedding=query_embedding,
documents=documents,
document_embeddings=doc_embeddings,
keyword_weight=0.5,
semantic_weight=0.5,
top_k=2
)
print("\nHybrid Search Results:")
for result in results:
print(f" Rank {result.rank}: {result.document.id} (Score: {result.score:.3f})")
Hybrid search is often the most effective solution for RAG systems because it performs well across a wide range of query types. For a query like "async python patterns," it can retrieve documents that contain the exact term "async" while also finding documents that discuss related ideas like "concurrent execution," providing a more comprehensive context for the LLM.
The best search strategy depends on your documents and the types of queries you expect. A good starting point is to use hybrid search, as it is the most versatile. However, understanding the strengths of each method allows you to tune your system for optimal performance.
| Scenario | Recommended Method | Rationale |
|---|---|---|
| User searches for an exact error message | Keyword | The query contains specific, literal text that must be matched precisely. |
| User asks a broad, open-ended question | Semantic | The query's intent and meaning are more important than the exact words used. |
| User searches technical documentation | Hybrid | The query may contain specific technical terms (keywords) but also ask about broader applications (semantic). |
| User is looking for a specific code snippet | Keyword-heavy Hybrid | Syntax and function names are important, but the surrounding context also matters. |
By implementing the right search strategy, you ensure that the context passed to your LLM is as relevant and comprehensive as possible, directly improving the quality and accuracy of your RAG system's responses.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with