Let's transition from the theoretical understanding of hybrid search to a practical implementation. In this section, we'll construct a basic hybrid search pipeline using Python, combining keyword search (specifically BM25) with vector similarity search. This hands-on exercise demonstrates how to integrate these distinct retrieval methods and fuse their results for improved relevance.We assume you have a working Python environment with libraries for vector manipulation and potentially a vector database client or library like Faiss. You'll also need libraries for keyword search.Prerequisites and SetupBefore we begin, ensure you have the necessary libraries installed. For this example, we'll use rank_bm25 for keyword search, sentence-transformers for embedding generation (though we'll assume vectors are pre-computed for the search phase), and numpy.pip install rank_bm25 numpy sentence-transformersWe'll work with a dataset of documents, where each document has an ID, text content, and a pre-computed vector embedding.# Sample data structure (replace with your actual data loading) documents = [ {"id": "doc1", "text": "Introduction to machine learning concepts.", "vector": [0.1, 0.2, ..., 0.5]}, {"id": "doc2", "text": "Advanced deep learning techniques for NLP.", "vector": [0.3, 0.1, ..., 0.7]}, {"id": "doc3", "text": "A guide to Python programming basics.", "vector": [0.8, 0.7, ..., 0.1]}, {"id": "doc4", "text": "Optimizing machine learning models.", "vector": [0.2, 0.3, ..., 0.6]}, # ... more documents ] # Simulate loading pre-computed vectors into a structure for lookup vector_index_data = {doc["id"]: doc["vector"] for doc in documents} document_texts = {doc["id"]: doc["text"] for doc in documents} # Example query query_text = "machine learning optimization" # Assume we have a function to get query embedding # In a real scenario, you'd use the same model as for document vectors from sentence_transformers import SentenceTransformer embedding_model = SentenceTransformer('all-MiniLM-L6-v2') # Example model def get_query_embedding(query): return embedding_model.encode(query) query_vector = get_query_embedding(query_text)Step 1: Implementing Keyword Search (BM25)First, let's set up the BM25 component. We'll index the text content of our documents. The rank_bm25 library provides a straightforward implementation.import numpy as np from rank_bm25 import BM25Okapi # Tokenize the documents (simple whitespace splitting for this example) tokenized_corpus = [doc["text"].lower().split() for doc in documents] doc_ids = [doc["id"] for doc in documents] # Create the BM25 index bm25 = BM25Okapi(tokenized_corpus) def perform_keyword_search(query_text, top_n=10): """Performs BM25 search and returns ranked document IDs and scores.""" tokenized_query = query_text.lower().split() # Get scores for all documents doc_scores = bm25.get_scores(tokenized_query) # Get the indices of the top N scores top_indices = np.argsort(doc_scores)[::-1][:top_n] # Map indices back to document IDs and create results list results = [] for index in top_indices: if doc_scores[index] > 0: # Only include docs with non-zero score results.append({"id": doc_ids[index], "score": doc_scores[index]}) return results # Perform keyword search keyword_results = perform_keyword_search(query_text, top_n=5) print("Keyword Search Results (BM25):") print(keyword_results) # Expected Output (example): # Keyword Search Results (BM25): # [{'id': 'doc4', 'score': 0.937...}, {'id': 'doc1', 'score': 0.937...}, {'id': 'doc2', 'score': 0.0}] # Note: Scores depend heavily on corpusThis gives us a list of document IDs ranked by their BM25 relevance score to the query.Step 2: Implementing Vector SearchNext, we simulate the vector search component. In a real system, this would involve querying an ANN index (like HNSW in Faiss, OpenSearch, Pinecone, Weaviate, etc.) built from the vector_index_data. Here, we'll simulate it by calculating cosine similarity against all vectors (inefficient for large datasets, but sufficient for demonstrating the hybrid concept).from numpy.linalg import norm def cosine_similarity(vec_a, vec_b): """Computes cosine similarity between two vectors.""" return np.dot(vec_a, vec_b) / (norm(vec_a) * norm(vec_b)) def perform_vector_search(query_vector, index_data, top_n=10): """Performs vector search and returns ranked document IDs and scores.""" scores = [] for doc_id, doc_vector in index_data.items(): similarity = cosine_similarity(query_vector, doc_vector) scores.append({"id": doc_id, "score": similarity}) # Sort by score descending scores.sort(key=lambda x: x["score"], reverse=True) return scores[:top_n] # Perform vector search vector_results = perform_vector_search(query_vector, vector_index_data, top_n=5) print("\nVector Search Results:") print(vector_results) # Expected Output (example): # Vector Search Results: # [{'id': 'doc4', 'score': 0.85...}, {'id': 'doc1', 'score': 0.78...}, {'id': 'doc2', 'score': 0.65...}, {'id': 'doc3', 'score': 0.12...}]Now we have a second list of document IDs, ranked by semantic similarity. Notice that the scores are in a different range (cosine similarity typically [-1, 1] or [0, 1], BM25 usually positive and unbounded) and the ranking might differ significantly from the keyword results.Step 3: Result Fusion using Reciprocal Rank Fusion (RRF)As discussed previously, simply adding or averaging scores is problematic due to different scales and distributions. Reciprocal Rank Fusion (RRF) offers a strong way to combine rankings without needing score normalization.The RRF score for a document $d$ is calculated as: $$Score_{RRF}(d) = \sum_{i \in \text{Result Lists}} \frac{1}{k + rank_i(d)}$$ Where $rank_i(d)$ is the rank of document $d$ in result list $i$ (starting from 1), and $k$ is a constant controlling the influence of lower-ranked items. A common value for $k$ is 60.Let's implement the RRF function:def apply_rrf(results_lists, k=60): """Applies Reciprocal Rank Fusion to combine multiple ranked lists.""" fused_scores = {} doc_data = {} # To store original scores if needed, or just the ID for results in results_lists: for rank, result in enumerate(results): doc_id = result["id"] if doc_id not in fused_scores: fused_scores[doc_id] = 0 doc_data[doc_id] = {"id": doc_id} # Store associated data # Calculate RRF contribution for this list rrf_score = 1.0 / (k + rank + 1) # Rank is 0-based, add 1 fused_scores[doc_id] += rrf_score # Combine IDs and scores, then sort fused_results = [{"id": doc_id, "score": fused_scores[doc_id]} for doc_id in fused_scores] fused_results.sort(key=lambda x: x["score"], reverse=True) return fused_results # Combine the keyword and vector results hybrid_results = apply_rrf([keyword_results, vector_results], k=60) print("\nHybrid Search Results (RRF):") print(hybrid_results) # Expected Output (example, depends on input ranks): # Hybrid Search Results (RRF): # [{'id': 'doc4', 'score': 0.032...}, {'id': 'doc1', 'score': 0.032...}, {'id': 'doc2', 'score': 0.016...}, {'id': 'doc3', 'score': 0.015...}] # Note: RRF scores are relative; the ranking is the primary output. # In this example, doc4 and doc1 appeared high in both lists, getting higher RRF scores.Putting it Together: The Hybrid Pipeline WorkflowWe can encapsulate the process into a single function representing our hybrid search pipeline.digraph HybridSearchPipeline { rankdir=LR; node [shape=box, style=filled, fillcolor="#a5d8ff"]; edge [color="#495057"]; Query [label="User Query", shape=ellipse, style=filled, fillcolor="#ffec99"]; Embedding [label="Generate Query Embedding"]; VectorSearch [label="Vector Search\n(ANN Index)"]; KeywordSearch [label="Keyword Search\n(BM25 Index)"]; Fusion [label="Result Fusion\n(RRF)", style=filled, fillcolor="#96f2d7"]; Results [label="Ranked Results", shape=ellipse, style=filled, fillcolor="#ffec99"]; Query -> Embedding [label=" text "]; Query -> KeywordSearch [label=" text "]; Embedding -> VectorSearch [label=" vector "]; subgraph cluster_retrieval { label = "Retrieval Stage"; style=filled; color="#e9ecef"; VectorSearch; KeywordSearch; } VectorSearch -> Fusion [label=" Vector Ranks "]; KeywordSearch -> Fusion [label=" Keyword Ranks "]; Fusion -> Results; }A diagram illustrating the hybrid search pipeline flow. The user query is used for both keyword search and to generate an embedding for vector search. Results from both are then fused (e.g., using RRF) to produce the final ranked list.def hybrid_search_pipeline(query_text, top_n=10, rrf_k=60): """Executes the full hybrid search pipeline.""" print(f"Performing hybrid search for: '{query_text}'") # 1. Get Query Embedding query_vector = get_query_embedding(query_text) # 2. Perform Keyword Search keyword_results = perform_keyword_search(query_text, top_n=top_n * 2) # Retrieve more initially print(f" BM25 returned {len(keyword_results)} results.") # 3. Perform Vector Search # Replace placeholder with actual call to your vector search system vector_results = perform_vector_search(query_vector, vector_index_data, top_n=top_n * 2) # Retrieve more initially print(f" Vector Search returned {len(vector_results)} results.") # 4. Apply RRF Fusion fused_results = apply_rrf([keyword_results, vector_results], k=rrf_k) print(f" Fused results generated {len(fused_results)} candidates.") # 5. Return final top N results with text final_results = [] for result in fused_results[:top_n]: doc_id = result["id"] final_results.append({ "id": doc_id, "text": document_texts.get(doc_id, "N/A"), # Fetch text for context "rrf_score": result["score"] }) return final_results # Example Usage final_ranked_list = hybrid_search_pipeline(query_text, top_n=5) print("\nFinal Hybrid Ranked List:") for i, item in enumerate(final_ranked_list): print(f"{i+1}. ID: {item['id']}, Score: {item['rrf_score']:.4f}, Text: {item['text'][:80]}...")Next StepsPerformance: This pipeline executes two separate search queries (keyword and vector) and then performs fusion. The total latency is roughly the maximum latency of the two searches plus the fusion overhead (usually negligible). For production systems, optimizing the individual search components (Chapter 2) and potentially parallelizing the queries is important.Tuning k: The RRF constant k influences how much weight is given to lower-ranked items. A smaller k gives more importance to top ranks. Its optimal value might depend on the specific dataset and query patterns and can be tuned via offline evaluation (Chapter 5).Retrieval Size: Notice we retrieved top_n * 2 results from each system before fusion. Retrieving more candidates (e.g., top 50 or 100) from each source before fusion generally leads to better final results, at the cost of slightly higher fusion computation.Alternative Fusion: While RRF is common, explore other methods like weighted score combination (if scores can be meaningfully normalized) or more complex learning-to-rank models trained on features from both search results.Evaluation: Rigorously evaluate the hybrid system against pure vector and pure keyword search using relevance metrics (like NDCG, Recall) on a ground truth dataset, as detailed in Chapter 5. This helps quantify the benefit and identify areas for improvement.This hands-on exercise provides a foundational blueprint for building hybrid search systems. By combining the strengths of lexical and semantic retrieval, you can often achieve significantly better relevance than either approach alone, particularly for complex information needs common in LLM applications like RAG.