Vector search and keyword search each provide distinct lists of ranked documents for a given query. Combining these lists into a single, coherent ranking that uses the strengths of both semantic relevance (from vectors) and term matching (from keywords) is a challenge addressed by result fusion algorithms. Simply concatenating or alternating results often produces suboptimal rankings. Effective fusion requires a principled approach to merge the information contained in the individual ranked lists.
Several techniques exist, ranging from simple heuristics to more sophisticated methods. A common challenge is that the relevance scores produced by different systems (e.g., cosine similarity for vectors, BM25 scores for keywords) are often not directly comparable. Vector similarities might range from -1 to 1 or 0 to 1, while BM25 scores can be unbounded positive numbers. Fusion methods that rely directly on these scores often require careful normalization, which can be difficult to get right.
Reciprocal Rank Fusion (RRF)
Reciprocal Rank Fusion (RRF) offers an elegant solution that sidesteps the score normalization problem by focusing solely on the rank of each document within the individual result lists. It operates on the principle that documents consistently ranked higher across multiple systems are likely more relevant.
The RRF score for a document d is calculated by summing the reciprocal of its rank in each list, adjusted by a constant k:
RRF_Score(d)=i=1∑Nk+ranki(d)1
Where:
- N is the number of result lists being fused (e.g., N=2 for vector + keyword search).
- ranki(d) is the rank of document d in result list i. Ranks typically start at 1 for the top result. If a document does not appear in list i, its contribution from that list is effectively zero (its rank is infinite).
- k is a constant used to mitigate the impact of high ranks (e.g., ranks 1 vs 2) versus lower ranks (e.g., ranks 50 vs 51). A common value for k is 60, based on empirical studies, but it can be tuned. Increasing k reduces the influence of rank differences, making the fusion less sensitive to the top positions.
How RRF Works
Imagine you have results from a vector search (List V) and a keyword search (List K):
- List V: DocA (rank 1), DocB (rank 2), DocC (rank 3)
- List K: DocB (rank 1), DocD (rank 2), DocA (rank 3)
Let's calculate the RRF scores using k=60:
| Document |
Rank (List V) |
Rank (List K) |
Contribution V (1/(60+rank)) |
Contribution K (1/(60+rank)) |
RRF Score (Sum) |
| DocA |
1 |
3 |
1/61 ≈ 0.01639 |
1/63 ≈ 0.01587 |
0.03226 |
| DocB |
2 |
1 |
1/62 ≈ 0.01613 |
1/61 ≈ 0.01639 |
0.03252 |
| DocC |
3 |
Not present |
1/63 ≈ 0.01587 |
0 |
0.01587 |
| DocD |
Not present |
2 |
0 |
1/62 ≈ 0.01613 |
0.01613 |
The final fused list is ordered by descending RRF score: DocB, DocA, DocD, DocC.
RRF calculation results for the example documents, showing DocB slightly outscoring DocA due to its higher rank in one list despite DocA being present in both.
Advantages of RRF:
- Simplicity: Easy to understand and implement.
- No Score Normalization Needed: Ignores absolute scores, relying only on ranks.
- Robustness: Generally performs well across various tasks without extensive tuning.
- Handles Missing Documents: Naturally incorporates documents present in only some lists.
Key Points for RRF:
- Ignores Score Information: Discards potentially useful information contained in the original relevance scores. Two documents ranked 1 and 2 might have very different actual relevance scores, but RRF treats the rank difference the same regardless.
- Choice of k: While often defaulting to 60, the optimal k might vary depending on the characteristics of the result lists and the application. Lower k gives more weight to top-ranked items.
Other Fusion Algorithms
While RRF is a popular and effective choice, other fusion methods exist:
-
Score-Based Combination (CombSUM, CombMNZ): These methods attempt to combine the original relevance scores.
- CombSUM: Simply sums the scores for each document across all lists: CombSUM_Score(d)=∑i=1Nnormalized_scorei(d). This requires scores to be normalized into a comparable range (e.g., 0 to 1) beforehand.
- CombMNZ: Calculates the sum of scores (like CombSUM) and multiplies it by the number of lists the document appeared in: CombMNZ_Score(d)=(∑i=1Nnormalized_scorei(d))×∣{i∣scorei(d)>0}∣. This tends to boost documents found by multiple systems.
- Challenge: The effectiveness of CombSUM and CombMNZ heavily depends on the quality of score normalization. Poor normalization can lead to one system's scores dominating the fusion.
-
Rank-Based Combination (Borda Count): Similar to RRF, Borda Count uses ranks. For a list of size M, it assigns points based on rank: M points for rank 1, M−1 points for rank 2, ..., 1 point for rank M. Scores are summed across lists for each document. Like RRF, it avoids score normalization but uses a different weighting scheme based on rank position.
-
Weighted Fusion: A straightforward approach is to assign weights to each search system and compute a weighted sum of (normalized) scores or ranks.
WeightedScore(d)=i=1∑Nwi×normalized_scorei(d)
Or using ranks (similar to weighted RRF):
WeightedRankScore(d)=i=1∑Nwi×k+ranki(d)1
The challenge lies in determining the optimal weights (wi). These might be set heuristically or, more systematically, learned using machine learning techniques (often called "Learning to Rank" or LTR), where a model is trained on labeled data to predict the best combination. LTR is powerful but significantly more complex to implement and maintain.
Choosing and Implementing Fusion
For many hybrid search applications, RRF provides a strong baseline due to its simplicity and robustness. It's often the recommended starting point.
- Start with RRF (e.g., k=60).
- Evaluate its performance on your specific task and dataset.
- If score normalization is reliable and feasible for your retrieval systems, experiment with CombSUM or CombMNZ.
- Use weighted fusion if you have strong reasons to believe one system is consistently more reliable than another, but be prepared to tune the weights.
- Investigate Learning to Rank if achieving the highest possible relevance is critical and you have the resources and labeled data to train a fusion model.
When implementing fusion:
- Retrieve Results: Fetch the top K results (including document IDs and ranks/scores) from each underlying search system (vector, keyword, etc.).
- Collect Unique Documents: Identify all unique document IDs across all retrieved lists.
- Calculate Fused Scores: For each unique document, apply the chosen fusion logic (RRF, CombSUM, etc.). Remember to handle documents missing from certain lists appropriately (e.g., rank = ∞ or score = 0).
- Sort and Return: Sort the documents based on their final fused scores in descending order and return the top results.
Fusion adds a computational step after retrieving results from individual systems. While typically fast compared to the search itself, ensure the implementation is efficient, especially for latency-sensitive applications. By thoughtfully combining results using methods like RRF, you can build hybrid search systems that significantly outperform any single retrieval method alone.