Retrieval methods for augmenting language models effectively identify a wide collection of potentially relevant documents. However, these methods often prioritize finding all possible matches (recall) over ensuring the top results are the most useful (precision). Such an approach can result in a context window filled with redundant or tangentially related information, potentially confusing the LLM or wasting valuable token space. Re-ranking serves as a second-stage process, refining the initial set of retrieved documents. It applies more sophisticated scoring to identify the most relevant, diverse, and useful documents for the generation model.
The simplest way to improve relevance is by re-ordering the initial search results based on additional signals. These signals often come from document metadata, such as publication dates or popularity metrics. The rerank_results function provides a straightforward way to apply these strategies.
For instance, after performing an initial search, you can re-rank the results to prioritize more recent or more popular documents. This is particularly useful in applications like news summarization or question-answering over community forums.
Let's start with an initial set of search results.
from kerb.retrieval import keyword_search, rerank_results
# Assume 'documents' is a list of Document objects with metadata
query = "python async programming"
initial_results = keyword_search(query, documents, top_k=10)
print("Initial keyword search results (top 4):")
for r in initial_results[:4]:
print(f" {r.rank}. {r.document.id} (score: {r.score:.3f})")
Now, we can apply different re-ranking methods. To favor newer documents, we use the recency method, which inspects date information in the metadata.
# Re-rank by recency
recency_ranked = rerank_results(query, initial_results, method="recency", top_k=4)
print("\nRe-ranked by RECENCY (date):")
for r in recency_ranked:
date = r.document.metadata.get('date', 'N/A')
print(f" {r.rank}. {r.document.id} (score: {r.score:.3f}, date: {date})")
Similarly, if your documents have popularity metadata like view counts, you can use the popularity method.
# Re-rank by popularity
popularity_ranked = rerank_results(query, initial_results, method="popularity", top_k=4)
print("\nRe-ranked by POPULARITY (views):")
for r in popularity_ranked:
views = r.document.metadata.get('views', 0)
print(f" {r.rank}. {r.document.id} (score: {r.score:.3f}, views: {views})")
The built-in methods are useful, but you often need to implement business logic specific to your application. The rerank_results function supports this through a custom scorer. You can pass a function that calculates a new score for each document, allowing you to combine multiple signals.
For example, you might want to boost documents from a specific author or those belonging to a particularly important category. Your custom scorer function receives the query and a Document object and should return a floating-point score, which is then multiplied by the original relevance score.
from kerb.retrieval import Document
def category_booster(query: str, doc: Document) -> float:
"""Boost scores for documents in 'programming' category and by author 'Alice'."""
score_multiplier = 1.0
if doc.metadata.get('category') == 'programming':
score_multiplier *= 1.5 # Boost programming docs by 50%
if doc.metadata.get('author') == 'Alice':
score_multiplier *= 1.2 # Boost docs by Alice by 20%
return score_multiplier
# Apply the custom scorer
custom_ranked = rerank_results(
query,
initial_results,
method="custom",
scorer=category_booster,
top_k=4
)
print("\nRe-ranked with custom scorer:")
for r in custom_ranked:
category = r.document.metadata.get('category')
author = r.document.metadata.get('author')
print(f" {r.rank}. {r.document.id} (new score: {r.score:.3f})")
print(f" Category: {category}, Author: {author}")
This approach provides a powerful way to inject domain-specific knowledge and business rules directly into your retrieval pipeline.
A common issue in retrieval is that the top results can be highly redundant. For example, a search for "Python async" might return multiple documents that all explain the async/await syntax in slightly different ways. This is not an efficient use of the LLM's context window.
Maximal Marginal Relevance (MMR) is a technique used to select a set of results that is both relevant to the query and diverse. It iteratively selects documents by optimizing a formula that balances these two aspects. The diversify_results function implements MMR.
The diversity_factor parameter controls this balance:
0 prioritizes pure relevance, selecting the highest-scoring documents.1 prioritizes pure diversity, selecting documents that are most different from each other.0 and 1 balances the two. A common starting point is 0.5.from kerb.retrieval import diversify_results
# Retrieve a larger set of initial candidates
initial_results_for_mmr = keyword_search(query, documents, top_k=8)
print("Before diversity (top 5):")
for r in initial_results_for_mmr[:5]:
print(f" {r.rank}. {r.document.id} - {r.document.content[:50]}...")
# Apply diversification
diverse_results = diversify_results(
initial_results_for_mmr,
max_results=5,
diversity_factor=0.5
)
print(f"\nAfter diversity (diversity_factor=0.5):")
for r in diverse_results:
print(f" {r.rank}. {r.document.id} - {r.document.content[:50]}...")
Using MMR ensures that the context provided to the LLM covers a broader range of information, reducing redundancy and improving the quality of the final generated answer.
For complex questions, a single query might not be sufficient to retrieve all necessary information. A common pattern is to break down a complex query into several sub-queries, run a search for each, and then combine the results. Reciprocal Rank Fusion (RRF) is a simple and effective algorithm for merging multiple ranked lists.
RRF calculates a new score for each document based on its rank in each result list. The formula gives more weight to documents that consistently appear at higher ranks across different searches.
Here, is the rank of document in result list , and is a constant (commonly set to 60) that dampens the influence of low ranks.
The reciprocal_rank_fusion function implements this. You can use it to combine results from keyword searches, semantic searches, or hybrid approaches on different query variations.
from kerb.retrieval import reciprocal_rank_fusion
# Create multiple result sets from different queries
results1 = keyword_search("python async", documents, top_k=5)
results2 = keyword_search("concurrent programming", documents, top_k=5)
results3 = keyword_search("asyncio library", documents, top_k=5)
# Fuse the results into a single ranked list
fused_results = reciprocal_rank_fusion([results1, results2, results3], k=60)
print("Fused top results from three different queries:")
for r in fused_results[:5]:
print(f" {r.rank}. {r.document.id} (RRF score: {r.score:.3f})")
RRF is a powerful technique for improving recall, ensuring that your RAG system considers a wider set of potentially relevant documents before the final context selection.
In a production environment, these re-ranking techniques are often combined into a multi-stage pipeline to progressively refine the retrieved context.
A multi-stage re-ranking pipeline refines results from initial retrieval to a final, optimized context.
This pipeline ensures that you start with a broad set of documents and systematically narrow them down to the most relevant, diverse, and useful set for the LLM.
# Stage 1: Initial retrieval of a large candidate pool
stage1 = keyword_search("python async web development", documents, top_k=10)
print(f"Stage 1 - Initial retrieval: {len(stage1)} results")
# Stage 2: Re-rank by relevance to get a smaller, more relevant set
stage2 = rerank_results(query, stage1, method="relevance", top_k=6)
print(f"Stage 2 - Relevance re-rank: {len(stage2)} results")
# Stage 3: Apply diversity to avoid redundancy
stage3 = diversify_results(stage2, max_results=4, diversity_factor=0.4)
print(f"Stage 3 - Diversification: {len(stage3)} results")
# Stage 4: Final boost based on popularity or recency
final_context_docs = rerank_results(query, stage3, method="popularity", top_k=3)
print(f"Stage 4 - Final context: {len(final_context_docs)} results")
print("\nFinal ranked documents for LLM context:")
for r in final_context_docs:
print(f" {r.rank}. {r.document.id} (score: {r.score:.3f})")
By combining these techniques, you can significantly improve the quality of the context provided to your RAG system, leading to more accurate, relevant, and comprehensive answers.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with