Improving Relevance with Re-ranking

Retrieval methods for augmenting language models effectively identify a wide collection of potentially relevant documents. However, these methods often prioritize finding all possible matches (recall) over ensuring the top results are the most useful (precision). Such an approach can result in a context window filled with redundant or tangentially related information, potentially confusing the LLM or wasting valuable token space. Re-ranking serves as a second-stage process, refining the initial set of retrieved documents. It applies more sophisticated scoring to identify the most relevant, diverse, and useful documents for the generation model.

Applying Different Re-ranking Strategies

The simplest way to improve relevance is by re-ordering the initial search results based on additional signals. These signals often come from document metadata, such as publication dates or popularity metrics. The rerank_results function provides a straightforward way to apply these strategies.

For instance, after performing an initial search, you can re-rank the results to prioritize more recent or more popular documents. This is particularly useful in applications like news summarization or question-answering over community forums.

Let's start with an initial set of search results.

from kerb.retrieval import keyword_search, rerank_results

# Assume 'documents' is a list of Document objects with metadata
query = "python async programming"
initial_results = keyword_search(query, documents, top_k=10)

print("Initial keyword search results (top 4):")
for r in initial_results[:4]:
    print(f"  {r.rank}. {r.document.id} (score: {r.score:.3f})")

Now, we can apply different re-ranking methods. To favor newer documents, we use the recency method, which inspects date information in the metadata.

# Re-rank by recency
recency_ranked = rerank_results(query, initial_results, method="recency", top_k=4)

print("\nRe-ranked by RECENCY (date):")
for r in recency_ranked:
    date = r.document.metadata.get('date', 'N/A')
    print(f"    {r.rank}. {r.document.id} (score: {r.score:.3f}, date: {date})")

Similarly, if your documents have popularity metadata like view counts, you can use the popularity method.

# Re-rank by popularity
popularity_ranked = rerank_results(query, initial_results, method="popularity", top_k=4)

print("\nRe-ranked by POPULARITY (views):")
for r in popularity_ranked:
    views = r.document.metadata.get('views', 0)
    print(f"    {r.rank}. {r.document.id} (score: {r.score:.3f}, views: {views})")

Implementing Custom Scoring Logic

The built-in methods are useful, but you often need to implement business logic specific to your application. The rerank_results function supports this through a custom scorer. You can pass a function that calculates a new score for each document, allowing you to combine multiple signals.

For example, you might want to boost documents from a specific author or those belonging to a particularly important category. Your custom scorer function receives the query and a Document object and should return a floating-point score, which is then multiplied by the original relevance score.

from kerb.retrieval import Document

def category_booster(query: str, doc: Document) -> float:
    """Boost scores for documents in 'programming' category and by author 'Alice'."""
    score_multiplier = 1.0
    if doc.metadata.get('category') == 'programming':
        score_multiplier *= 1.5  # Boost programming docs by 50%

    if doc.metadata.get('author') == 'Alice':
        score_multiplier *= 1.2  # Boost docs by Alice by 20%

    return score_multiplier

# Apply the custom scorer
custom_ranked = rerank_results(
    query, 
    initial_results, 
    method="custom", 
    scorer=category_booster,
    top_k=4
)

print("\nRe-ranked with custom scorer:")
for r in custom_ranked:
    category = r.document.metadata.get('category')
    author = r.document.metadata.get('author')
    print(f"  {r.rank}. {r.document.id} (new score: {r.score:.3f})")
    print(f"       Category: {category}, Author: {author}")

This approach provides a powerful way to inject domain-specific knowledge and business rules directly into your retrieval pipeline.

Increasing Diversity with Maximal Marginal Relevance (MMR)

A common issue in retrieval is that the top results can be highly redundant. For example, a search for "Python async" might return multiple documents that all explain the async/await syntax in slightly different ways. This is not an efficient use of the LLM's context window.

Maximal Marginal Relevance (MMR) is a technique used to select a set of results that is both relevant to the query and diverse. It iteratively selects documents by optimizing a formula that balances these two aspects. The diversify_results function implements MMR.

The diversity_factor parameter controls this balance:

A value of 0 prioritizes pure relevance, selecting the highest-scoring documents.
A value of 1 prioritizes pure diversity, selecting documents that are most different from each other.
A value between 0 and 1 balances the two. A common starting point is 0.5.

from kerb.retrieval import diversify_results

# Retrieve a larger set of initial candidates
initial_results_for_mmr = keyword_search(query, documents, top_k=8)

print("Before diversity (top 5):")
for r in initial_results_for_mmr[:5]:
    print(f"  {r.rank}. {r.document.id} - {r.document.content[:50]}...")

# Apply diversification
diverse_results = diversify_results(
    initial_results_for_mmr,
    max_results=5,
    diversity_factor=0.5
)

print(f"\nAfter diversity (diversity_factor=0.5):")
for r in diverse_results:
    print(f"  {r.rank}. {r.document.id} - {r.document.content[:50]}...")

Using MMR ensures that the context provided to the LLM covers a broader range of information, reducing redundancy and improving the quality of the final generated answer.

Fusing Results from Multiple Queries

For complex questions, a single query might not be sufficient to retrieve all necessary information. A common pattern is to break down a complex query into several sub-queries, run a search for each, and then combine the results. Reciprocal Rank Fusion (RRF) is a simple and effective algorithm for merging multiple ranked lists.

RRF calculates a new score for each document based on its rank in each result list. The formula gives more weight to documents that consistently appear at higher ranks across different searches.

Score_{RRF}(d) = \sum_{i=1}^{N} \frac{1}{k + rank_i(d)}

Here, $rank_i(d)$ is the rank of document $d$ in result list $i$ , and $k$ is a constant (commonly set to 60) that dampens the influence of low ranks.

The reciprocal_rank_fusion function implements this. You can use it to combine results from keyword searches, semantic searches, or hybrid approaches on different query variations.

from kerb.retrieval import reciprocal_rank_fusion

# Create multiple result sets from different queries
results1 = keyword_search("python async", documents, top_k=5)
results2 = keyword_search("concurrent programming", documents, top_k=5)
results3 = keyword_search("asyncio library", documents, top_k=5)

# Fuse the results into a single ranked list
fused_results = reciprocal_rank_fusion([results1, results2, results3], k=60)

print("Fused top results from three different queries:")
for r in fused_results[:5]:
    print(f"  {r.rank}. {r.document.id} (RRF score: {r.score:.3f})")

RRF is a powerful technique for improving recall, ensuring that your RAG system examines a wider set of potentially relevant documents before the final context selection.

Building a Multi-Stage Pipeline

In a production environment, these re-ranking techniques are often combined into a multi-stage pipeline to progressively refine the retrieved context.

A multi-stage re-ranking pipeline refines results from initial retrieval to a final, optimized context.

This pipeline ensures that you start with a broad set of documents and systematically narrow them down to the most relevant, diverse, and useful set for the LLM.

# Stage 1: Initial retrieval of a large candidate pool
stage1 = keyword_search("python async web development", documents, top_k=10)
print(f"Stage 1 - Initial retrieval: {len(stage1)} results")

# Stage 2: Re-rank by relevance to get a smaller, more relevant set
stage2 = rerank_results(query, stage1, method="relevance", top_k=6)
print(f"Stage 2 - Relevance re-rank: {len(stage2)} results")

# Stage 3: Apply diversity to avoid redundancy
stage3 = diversify_results(stage2, max_results=4, diversity_factor=0.4)
print(f"Stage 3 - Diversification: {len(stage3)} results")

# Stage 4: Final boost based on popularity or recency
final_context_docs = rerank_results(query, stage3, method="popularity", top_k=3)
print(f"Stage 4 - Final context: {len(final_context_docs)} results")

print("\nFinal ranked documents for LLM context:")
for r in final_context_docs:
    print(f"  {r.rank}. {r.document.id} (score: {r.score:.3f})")

By combining these techniques, you can significantly improve the quality of the context provided to your RAG system, leading to more accurate, relevant, and comprehensive answers.

Was this section helpful?

References

The use of MMR for reordering documents and producing summaries, Jaime G. Carbonell and Jade Goldstein, 1998 Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (ACM) DOI: 10.1145/290941.291025 - Introduces Maximal Marginal Relevance (MMR) for selecting relevant and diverse documents, a method to avoid redundancy in results.
Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, 2008 (Cambridge University Press) - Covers information retrieval fundamentals, including ranking algorithms, precision, and recall, which are core to understanding re-ranking.
Retrieval-Augmented Generation for Large Language Models: A Survey, Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang, 2023 arXiv preprint arXiv:2312.10997 DOI: 10.48550/arXiv.2312.10997 - Presents an overview of Retrieval-Augmented Generation (RAG) systems, discussing various stages like retrieval and re-ranking for improved performance.