Approximate Nearest Neighbor (ANN) algorithms are controlled by various parameters. Experimentation demonstrates how adjusting these parameters impacts search performance. This practical exploration aims to build intuition about the trade-offs between indexing time, search speed, and accuracy (recall).We'll primarily focus on HNSW, as it's a widely used and effective algorithm available in many vector databases. We'll use the qdrant-client library, which allows us to run a Qdrant instance entirely in memory for easy experimentation, and provides fine-grained control over HNSW parameters.SetupFirst, ensure you have the necessary libraries installed. You'll need qdrant-client for the vector database interactions, numpy for numerical operations, sentence-transformers to generate some example embeddings (or you can use pre-computed ones), time for measuring latency, and scikit-learn to calculate recall against a ground truth.pip install qdrant-client numpy sentence-transformers scikit-learnNext, let's prepare some data and set up our Qdrant client running in memory. We'll generate a small set of random vectors for simplicity, but you could easily substitute these with embeddings generated from real text data using a model like all-MiniLM-L6-v2.import numpy as np from qdrant_client import QdrantClient, models from sentence_transformers import SentenceTransformer # Or use your own embeddings import time from sklearn.neighbors import NearestNeighbors # Configuration NUM_VECTORS = 1000 DIMENSION = 384 # Example dimension for all-MiniLM-L6-v2 COLLECTION_NAME = "hnsw_experiment_collection" DISTANCE_METRIC = models.Distance.COSINE # Generate sample data (replace with real embeddings if desired) print(f"Generating {NUM_VECTORS} random vectors of dimension {DIMENSION}...") # Use a fixed seed for reproducibility np.random.seed(42) data_vectors = np.random.rand(NUM_VECTORS, DIMENSION).astype(np.float32) query_vector = np.random.rand(1, DIMENSION).astype(np.float32) # --- Optional: Use Sentence Transformers for more realistic data --- # model = SentenceTransformer('all-MiniLM-L6-v2') # sample_texts = ["This is sentence one.", "Here is another sentence.", "Example text for vector search."] # Add more texts # data_vectors = model.encode(sample_texts) # query_text = "A query about sentences." # query_vector = model.encode([query_text]) # DIMENSION = data_vectors.shape[1] # NUM_VECTORS = data_vectors.shape[0] # ------------------------------------------------------------------ print("Sample data generated.") print(f"Data vectors shape: {data_vectors.shape}") print(f"Query vector shape: {query_vector.shape}") # Initialize Qdrant client in memory client = QdrantClient(":memory:") print("Qdrant client initialized in memory.") # Calculate ground truth (exact nearest neighbors) using scikit-learn print("Calculating ground truth (exact KNN)...") nn_exact = NearestNeighbors(n_neighbors=10, metric='cosine', algorithm='brute') nn_exact.fit(data_vectors) ground_truth_indices = nn_exact.kneighbors(query_vector, return_distance=False)[0] ground_truth_set = set(ground_truth_indices) print("Ground truth calculated.") # Helper function to calculate recall@10 def calculate_recall(retrieved_indices, ground_truth): retrieved_set = set(idx for idx in retrieved_indices) true_positives = len(retrieved_set.intersection(ground_truth)) recall = true_positives / len(ground_truth) if len(ground_truth) > 0 else 0 return recall This setup creates NUM_VECTORS random vectors, defines a query_vector, initializes an in-memory Qdrant client, and importantly, calculates the ground_truth_indices using scikit-learn's exact nearest neighbor search. This ground truth will be our reference for calculating recall.Experimenting with HNSW ParametersHNSW has several important parameters. We'll focus on three:m: The maximum number of connections per node on each layer of the graph. Higher m generally leads to better recall but increases index size and indexing time.ef_construct: The size of the dynamic list used during index construction. A larger ef_construct considers more potential neighbors while building the graph, leading to a potentially higher quality index (better recall) at the cost of longer indexing time.ef_search (often just ef at query time in Qdrant's API, distinct from ef_construct): The size of the dynamic list used during search. A larger ef_search looks at more potential neighbors at query time, generally improving recall but increasing query latency.Let's design a function to create a collection, index data, and perform searches, allowing us to vary these parameters.def run_hnsw_experiment(m_value, ef_construct_value, ef_search_value, k=10): """ Creates a Qdrant collection with specific HNSW parameters, indexes data, performs a search, and returns metrics. """ # Delete collection if it exists from previous runs try: client.delete_collection(collection_name=COLLECTION_NAME) # print(f"Collection '{COLLECTION_NAME}' deleted.") except Exception: # print(f"Collection '{COLLECTION_NAME}' does not exist, skipping deletion.") pass time.sleep(0.1) # Short pause to ensure deletion completes # Create collection with specified HNSW parameters start_index_time = time.time() client.create_collection( collection_name=COLLECTION_NAME, vectors_config=models.VectorParams( size=DIMENSION, distance=DISTANCE_METRIC ), hnsw_config=models.HnswConfigDiff( m=m_value, ef_construct=ef_construct_value ) ) # Index the data (using IDs 0 to NUM_VECTORS-1) client.upsert( collection_name=COLLECTION_NAME, points=models.Batch( ids=list(range(NUM_VECTORS)), vectors=data_vectors.tolist() ), wait=True # Wait for indexing to complete ) index_time = time.time() - start_index_time # Perform the search start_search_time = time.time() search_result = client.search( collection_name=COLLECTION_NAME, query_vector=query_vector[0].tolist(), search_params=models.SearchParams( hnsw_ef=ef_search_value, # This is the search-time ef parameter exact=False # Ensure we use the ANN index ), limit=k # Retrieve top k results ) search_latency = time.time() - start_search_time # Calculate recall retrieved_ids = [hit.id for hit in search_result] recall = calculate_recall(retrieved_ids, ground_truth_set) # print(f"Params: m={m_value}, ef_construct={ef_construct_value}, ef_search={ef_search_value}") # print(f" Index Time: {index_time:.4f}s") # print(f" Search Latency: {search_latency:.4f}s") # print(f" Recall@{k}: {recall:.4f}") # print("-" * 20) return { "m": m_value, "ef_construct": ef_construct_value, "ef_search": ef_search_value, "index_time": index_time, "search_latency": search_latency, "recall": recall } Now we can run experiments by calling this function with different parameter combinations.Varying ef_searchLet's keep m and ef_construct constant and see how changing ef_search affects recall and latency. A higher ef_search should generally increase both.results_ef_search = [] m_fixed = 16 ef_construct_fixed = 100 ef_search_values = [10, 20, 50, 100, 200] # Try different search complexities print(f"\n--- Experiment: Varying ef_search (m={m_fixed}, ef_construct={ef_construct_fixed}) ---") for ef_s in ef_search_values: result = run_hnsw_experiment(m_fixed, ef_construct_fixed, ef_s) results_ef_search.append(result) print(f"ef_search={ef_s}: Recall={result['recall']:.3f}, Latency={result['search_latency']:.4f}s") # Prepare data for plotting latencies = [r['search_latency'] for r in results_ef_search] recalls = [r['recall'] for r in results_ef_search] ef_s_labels = [str(r['ef_search']) for r in results_ef_search] We can visualize this trade-off:{"layout": {"title": "Recall vs. Search Latency (Varying ef_search)", "xaxis": {"title": "Search Latency (seconds)"}, "yaxis": {"title": "Recall@10", "range": [0, 1.1]}, "hovermode": "closest"}, "data": [{"type": "scatter", "mode": "lines+markers+text", "x": [0.0015, 0.0018, 0.0025, 0.0035, 0.0050], "y": [0.7, 0.8, 0.9, 0.95, 1.0], "text": ["ef=10", "ef=20", "ef=50", "ef=100", "ef=200"], "textposition": "top right", "marker": {"color": "#228be6", "size": 8}, "line": {"color": "#74c0fc"}}]}Recall improves significantly as ef_search increases, but search latency also rises. The optimal value depends on whether your application prioritizes higher accuracy or lower latency. Note: Exact values depend heavily on data, hardware, and library implementation.Varying ef_constructNow, let's fix m and ef_search and see how ef_construct impacts the index build time and the resulting search quality. Higher ef_construct should increase index time but potentially allow for better recall even with a lower ef_search.results_ef_construct = [] m_fixed = 16 ef_search_fixed = 50 # Keep search effort moderate ef_construct_values = [50, 100, 200, 400] # Try different construction complexities print(f"\n--- Experiment: Varying ef_construct (m={m_fixed}, ef_search={ef_search_fixed}) ---") for ef_c in ef_construct_values: result = run_hnsw_experiment(m_fixed, ef_c, ef_search_fixed) results_ef_construct.append(result) print(f"ef_construct={ef_c}: Index Time={result['index_time']:.3f}s, Recall={result['recall']:.3f}, Latency={result['search_latency']:.4f}s") # Prepare data for plotting index_times_efc = [r['index_time'] for r in results_ef_construct] recalls_efc = [r['recall'] for r in results_ef_construct] ef_c_labels = [str(r['ef_construct']) for r in results_ef_construct]{"layout": {"title": "Recall vs. Index Time (Varying ef_construct)", "xaxis": {"title": "Index Build Time (seconds)"}, "yaxis": {"title": "Recall@10", "range": [0.7, 1.1]}, "hovermode": "closest"}, "data": [{"type": "scatter", "mode": "lines+markers+text", "x": [0.35, 0.55, 0.95, 1.6], "y": [0.85, 0.9, 0.93, 0.95], "text": ["ef_c=50", "ef_c=100", "ef_c=200", "ef_c=400"], "textposition": "bottom right", "marker": {"color": "#12b886", "size": 8}, "line": {"color": "#63e6be"}}]}Increasing ef_construct significantly increases the time needed to build the index. While it can improve recall for a fixed ef_search, the gains might diminish after a certain point. A higher ef_construct builds a potentially "better" graph, which might allow achieving good recall with a slightly lower ef_search later.Varying mFinally, let's vary m while keeping the ef parameters fixed. Remember, m controls the graph's connectivity.results_m = [] ef_construct_fixed = 100 ef_search_fixed = 50 m_values = [8, 16, 32, 64] # Try different connectivity levels print(f"\n--- Experiment: Varying m (ef_construct={ef_construct_fixed}, ef_search={ef_search_fixed}) ---") for m_val in m_values: result = run_hnsw_experiment(m_val, ef_construct_fixed, ef_search_fixed) results_m.append(result) print(f"m={m_val}: Index Time={result['index_time']:.3f}s, Recall={result['recall']:.3f}, Latency={result['search_latency']:.4f}s") # Prepare data for plotting index_times_m = [r['index_time'] for r in results_m] recalls_m = [r['recall'] for r in results_m] m_labels = [str(r['m']) for r in results_m] {"layout": {"title": "Recall vs. Index Time (Varying m)", "xaxis": {"title": "Index Build Time (seconds)"}, "yaxis": {"title": "Recall@10", "range": [0.7, 1.1]}, "hovermode": "closest"}, "data": [{"type": "scatter", "mode": "lines+markers+text", "x": [0.45, 0.55, 0.75, 1.1], "y": [0.88, 0.9, 0.92, 0.94], "text": ["m=8", "m=16", "m=32", "m=64"], "textposition": "bottom right", "marker": {"color": "#be4bdb", "size": 8}, "line": {"color": "#e599f7"}}]}Increasing m also increases index build time (and index size, though not measured here) while generally improving recall. Similar to ef_construct, the impact on recall might show diminishing returns. Common values for m are often in the range of 16 to 64.DiscussionThis hands-on exercise demonstrates the tangible effects of tuning ANN index parameters. You observed how:Increasing ef_search directly trades higher search latency for better recall.Increasing ef_construct and m invests more time (and potentially memory) during indexing to build a higher-quality graph structure, which can lead to better recall or allow for lower ef_search values during queries.The optimal parameters are not universal. They depend heavily on:Your specific dataset: The distribution and dimensionality of your vectors matter.Your hardware: CPU performance and memory bandwidth influence both indexing and search speed.Your application's requirements: Do you need the absolute highest recall possible, even if it takes longer? Or is minimizing latency the primary goal, accepting slightly lower recall?"Tuning often involves more systematic experimentation, potentially using larger datasets and automated evaluation frameworks to explore the parameter space and find the best balance for your specific needs. This practical experience, however, provides a foundational understanding of how these parameters influence the system's behavior, empowering you to make informed decisions when building your own semantic search applications."