Having explored the theoretical underpinnings of HNSW, including its layered graph structure and greedy search heuristic, it's time to translate that theory into practice. This section guides you through implementing an HNSW index using a common Python library, building an index, performing searches, and tuning its parameters to balance performance and accuracy for your specific needs. We assume you have a working Python environment and are familiar with NumPy for numerical operations.

Setting Up the Environment

First, ensure you have the necessary library installed. We'll use hnswlib, a popular and efficient implementation of HNSW. You'll also need numpy for creating vector data.

pip install hnswlib numpy

We'll work with synthetic data for this example. Let's generate some random 128-dimensional vectors, simulating typical embedding outputs from language models.

import hnswlib
import numpy as np
import time

# Define data parameters
dim = 128  # Dimensionality of the vectors
num_elements = 10000  # Number of vectors in our dataset

# Generate random data (replace with your actual embeddings)
np.random.seed(42) # for reproducibility
data = np.float32(np.random.random((num_elements, dim)))

# Generate unique IDs for each vector
data_labels = np.arange(num_elements)

# Generate a few query vectors
num_queries = 5
query_data = np.float32(np.random.random((num_queries, dim)))

Building the HNSW Index

Initializing and populating an HNSW index involves specifying the space (distance metric), dimensionality, and then adding the data points.

# Initialize the HNSW index
# Possible space options: 'l2', 'ip' (inner product), 'cosine'
space_name = 'l2' # Euclidean distance
index = hnswlib.Index(space=space_name, dim=dim)

# Set index parameters BEFORE adding data
# M: max number of connections per node per layer (default 16)
# ef_construction: size of dynamic list for neighbors during construction (default 200)
# Higher ef_construction leads to better index quality but slower build time
M = 16
ef_construction = 200
index.init_index(max_elements=num_elements, ef_construction=ef_construction, M=M)

# Add data to the index
# ids should be unique integers
print("Adding data to the index...")
start_time = time.time()
index.add_items(data, data_labels)
build_time = time.time() - start_time
print(f"Index build completed in {build_time:.2f} seconds.")

# Optional: Control the number of threads used for indexing
# index.set_num_threads(4) # Use 4 threads

Here, init_index prepares the index structure. max_elements should be at least the number of elements you intend to add. M controls the density of the graph connections, affecting memory usage and search quality. ef_construction influences the thoroughness of the search for neighbours during the build process; higher values generally lead to better recall but increase indexing time.

Performing Approximate Nearest Neighbor Searches

Once the index is built, you can perform searches using the knn_query method.

# Set search-time parameter ef_search
# ef_search: size of dynamic list for neighbors during search
# Higher ef_search leads to better recall but slower search time
# Must be >= k (number of neighbors requested)
ef_search = 50
k = 10 # Number of nearest neighbors to retrieve

print(f"\nPerforming k-NN search for {num_queries} queries (k={k}, ef_search={ef_search})...")

index.set_ef(ef_search) # Set ef_search for the query session

start_time = time.time()
labels, distances = index.knn_query(query_data, k=k)
search_time = time.time() - start_time

print(f"Search completed in {search_time:.4f} seconds.")
print(f"Average query time: {search_time / num_queries:.4f} seconds.")

# Display results for the first query
print("\nResults for the first query:")
print("Labels:", labels[0])
print("Distances:", distances[0])

The knn_query method returns two arrays: labels containing the integer IDs of the approximate nearest neighbors, and distances containing the corresponding calculated distances (e.g., L2 distance in our case). The critical parameter here is ef_search (set via set_ef). It controls the size of the candidate list during the search traversal. Increasing ef_search makes the search more exhaustive, potentially improving accuracy (recall) at the cost of higher latency. It must be at least k.

Tuning HNSW Parameters: The Recall/Speed Trade-off

The core challenge in using ANN algorithms like HNSW is balancing accuracy (recall) and performance (query latency, build time, memory usage). The primary tuning knobs are:

M (Build time): Defines the maximum number of outgoing connections for nodes in the graph layers (except layer 0). Higher M creates denser graphs, potentially improving recall and allowing for better navigation, but significantly increases memory footprint and build time. Typical values range from 8 to 64.
ef_construction (Build time): Controls the quality of the index construction. Higher values mean a more thorough search for neighbours when inserting elements, leading to a potentially better-structured graph (higher recall) but longer build times. Typical values range from 100 to 2000 or more, depending on the dataset and desired quality.
ef_search (Search time): Controls the trade-off between search speed and recall at query time. It determines the size of the priority queue used during graph traversal. Higher values increase the chance of finding the true nearest neighbors but slow down the query. This is often the most frequently tuned parameter after the index is built. It must be at least k.

A Practical Tuning Strategy:

Establish Ground Truth: For rigorous tuning, you need ground truth: the actual $k$ nearest neighbors for your query set. This usually requires running an exact, brute-force search. This is feasible for smaller datasets or representative samples.

# Example: Finding exact neighbors (computationally expensive!)
# from sklearn.neighbors import NearestNeighbors
# print("Calculating ground truth (this might take a while)...")
# nbrs = NearestNeighbors(n_neighbors=k, algorithm='brute', metric='euclidean').fit(data)
# true_distances, true_indices = nbrs.kneighbors(query_data)

Define Metrics: Measure:
- Recall@k: The proportion of true $k$ nearest neighbors found by HNSW among its top $k$ results, averaged over all queries. $Recall@k = \frac{1}{N_q} \sum_{i=1}^{N_q} \frac{| \text{ANN}_k(q_i) \cap \text{True}_k(q_i) |}{k}$ Where $N_q$ is the number of queries, $\text{ANN}_k(q_i)$ are the $k$ neighbors returned by HNSW for query $q_i$ , and $\text{True}_k(q_i)$ are the true $k$ neighbors.
- Query Latency: Average time taken per query.
- Index Size: Memory required to store the index.
- Build Time: Time taken to construct the index.
Iterative Tuning:
- Fix M and ef_construction (start with defaults like M=16, ef_construction=200). Build the index.
- Vary ef_search (e.g., from $k$ up to a few hundred). For each value, measure Recall@k and average query latency.
- Plot Recall@k vs. Latency. This curve clearly shows the trade-off for the current index configuration.
- If the desired recall cannot be reached even with high ef_search, or if the latency is too high, rebuild the index with increased ef_construction or M. Be mindful that increasing these parameters increases build time and potentially index size (especially M).
- Repeat the ef_search sweep and evaluation.

Example Visualization:

Let's simulate results for different ef_search values and plot the trade-off. Assume we have calculated recall against some ground truth.

# Simulated tuning results (replace with actual measurements)
ef_values = [10, 20, 50, 100, 200, 400]
recalls = [0.75, 0.85, 0.92, 0.96, 0.98, 0.99] # Example recall values
latencies_ms = [0.5, 0.8, 1.5, 2.8, 5.0, 9.5] # Example latencies in milliseconds

import json

chart_data = {
  "layout": {
    "title": "HNSW Tuning: Recall@10 vs. Query Latency",
    "xaxis": {"title": "Average Query Latency (ms)"},
    "yaxis": {"title": "Recall@10", "range": [0.7, 1.0]},
    "legend": {"title": "ef_search"},
    "margin": {"l": 50, "r": 50, "t": 50, "b": 50},
    "width": 600,
    "height": 400
  },
  "data": [
    {
      "x": latencies_ms,
      "y": recalls,
      "mode": "lines+markers",
      "type": "scatter",
      "name": "Recall",
      "text": [f"ef={ef}" for ef in ef_values], # Text displayed on hover
      "marker": {"color": "#339af0", "size": 8},
      "line": {"color": "#339af0", "width": 2}
    }
  ]
}

# Display chart (embedding JSON in markdown)
print("```plotly")
print(json.dumps(chart_data))
print("```")

The plot illustrates how increasing ef_search typically improves recall but also increases query latency. The optimal ef_search depends on the application's specific requirements for accuracy and speed.

Saving and Loading the Index

For production use, you'll want to save your tuned index and load it without rebuilding.

# Save the index to disk
index_path = 'my_hnsw_index.bin'
print(f"\nSaving index to {index_path}...")
index.save_index(index_path)
print("Index saved.")

# Load the index (in a new session or script)
# Need to know dim and space_name used during creation
loaded_index = hnswlib.Index(space=space_name, dim=dim)
print(f"\nLoading index from {index_path}...")
loaded_index.load_index(index_path)
print("Index loaded.")

# You can now set ef and query the loaded index
loaded_index.set_ef(ef_search)
labels_loaded, distances_loaded = loaded_index.knn_query(query_data, k=k)

# Verify results are the same
assert np.array_equal(labels, labels_loaded), "Loaded index results differ!"
print("Loaded index query successful and matches original.")

This practical exercise demonstrates the fundamental steps of using HNSW: initialization, building, querying, and the critical process of tuning parameters like M, ef_construction, and especially ef_search to achieve the desired balance between search accuracy and performance efficiency for your vector search application. Remember that optimal parameters are dataset-dependent and require careful evaluation based on your specific use case requirements.