Practical implementation of HNSW, an algorithm utilizing a layered graph structure and a greedy search heuristic for approximate nearest neighbor searches, involves using a common Python library to build an index, perform searches, and tune parameters to balance performance and accuracy. A working Python environment and familiarity with NumPy for numerical operations are assumed.
First, ensure you have the necessary library installed. We'll use hnswlib, a popular and efficient implementation of HNSW. You'll also need numpy for creating vector data.
pip install hnswlib numpy
We'll work with synthetic data for this example. Let's generate some random 128-dimensional vectors, simulating typical embedding outputs from language models.
import hnswlib
import numpy as np
import time
# Define data parameters
dim = 128 # Dimensionality of the vectors
num_elements = 10000 # Number of vectors in our dataset
# Generate random data (replace with your actual embeddings)
np.random.seed(42) # for reproducibility
data = np.float32(np.random.random((num_elements, dim)))
# Generate unique IDs for each vector
data_labels = np.arange(num_elements)
# Generate a few query vectors
num_queries = 5
query_data = np.float32(np.random.random((num_queries, dim)))
Initializing and populating an HNSW index involves specifying the space (distance metric), dimensionality, and then adding the data points.
# Initialize the HNSW index
# Possible space options: 'l2', 'ip' (inner product), 'cosine'
space_name = 'l2' # Euclidean distance
index = hnswlib.Index(space=space_name, dim=dim)
# Set index parameters BEFORE adding data
# M: max number of connections per node per layer (default 16)
# ef_construction: size of dynamic list for neighbors during construction (default 200)
# Higher ef_construction leads to better index quality but slower build time
M = 16
ef_construction = 200
index.init_index(max_elements=num_elements, ef_construction=ef_construction, M=M)
# Add data to the index
# ids should be unique integers
print("Adding data to the index...")
start_time = time.time()
index.add_items(data, data_labels)
build_time = time.time() - start_time
print(f"Index build completed in {build_time:.2f} seconds.")
# Optional: Control the number of threads used for indexing
# index.set_num_threads(4) # Use 4 threads
Here, init_index prepares the index structure. max_elements should be at least the number of elements you intend to add. M controls the density of the graph connections, affecting memory usage and search quality. ef_construction influences the thoroughness of the search for neighbours during the build process; higher values generally lead to better recall but increase indexing time.
Once the index is built, you can perform searches using the knn_query method.
# Set search-time parameter ef_search
# ef_search: size of dynamic list for neighbors during search
# Higher ef_search leads to better recall but slower search time
# Must be >= k (number of neighbors requested)
ef_search = 50
k = 10 # Number of nearest neighbors to retrieve
print(f"\nPerforming k-NN search for {num_queries} queries (k={k}, ef_search={ef_search})...")
index.set_ef(ef_search) # Set ef_search for the query session
start_time = time.time()
labels, distances = index.knn_query(query_data, k=k)
search_time = time.time() - start_time
print(f"Search completed in {search_time:.4f} seconds.")
print(f"Average query time: {search_time / num_queries:.4f} seconds.")
# Display results for the first query
print("\nResults for the first query:")
print("Labels:", labels[0])
print("Distances:", distances[0])
The knn_query method returns two arrays: labels containing the integer IDs of the approximate nearest neighbors, and distances containing the corresponding calculated distances (e.g., L2 distance in our case). The critical parameter here is ef_search (set via set_ef). It controls the size of the candidate list during the search traversal. Increasing ef_search makes the search more exhaustive, potentially improving accuracy (recall) at the cost of higher latency. It must be at least k.
The core challenge in using ANN algorithms like HNSW is balancing accuracy (recall) and performance (query latency, build time, memory usage). The primary tuning knobs are:
M (Build time): Defines the maximum number of outgoing connections for nodes in the graph layers (except layer 0). Higher M creates denser graphs, potentially improving recall and allowing for better navigation, but significantly increases memory footprint and build time. Typical values range from 8 to 64.ef_construction (Build time): Controls the quality of the index construction. Higher values mean a more thorough search for neighbours when inserting elements, leading to a potentially better-structured graph (higher recall) but longer build times. Typical values range from 100 to 2000 or more, depending on the dataset and desired quality.ef_search (Search time): Controls the trade-off between search speed and recall at query time. It determines the size of the priority queue used during graph traversal. Higher values increase the chance of finding the true nearest neighbors but slow down the query. This is often the most frequently tuned parameter after the index is built. It must be at least k.A Practical Tuning Strategy:
# Example: Finding exact neighbors (computationally expensive!)
# from sklearn.neighbors import NearestNeighbors
# print("Calculating ground truth (this might take a while)...")
# nbrs = NearestNeighbors(n_neighbors=k, algorithm='brute', metric='euclidean').fit(data)
# true_distances, true_indices = nbrs.kneighbors(query_data)
M and ef_construction (start with defaults like M=16, ef_construction=200). Build the index.ef_search (e.g., from k up to a few hundred). For each value, measure Recall@k and average query latency.ef_search, or if the latency is too high, rebuild the index with increased ef_construction or M. Be mindful that increasing these parameters increases build time and potentially index size (especially M).ef_search sweep and evaluation.Example Visualization:
Let's simulate results for different ef_search values and plot the trade-off. Assume we have calculated recall against some ground truth.
# Simulated tuning results (replace with actual measurements)
ef_values = [10, 20, 50, 100, 200, 400]
recalls = [0.75, 0.85, 0.92, 0.96, 0.98, 0.99] # Example recall values
latencies_ms = [0.5, 0.8, 1.5, 2.8, 5.0, 9.5] # Example latencies in milliseconds
import json
chart_data = {
"layout": {
"title": "HNSW Tuning: Recall@10 vs. Query Latency",
"xaxis": {"title": "Average Query Latency (ms)"},
"yaxis": {"title": "Recall@10", "range": [0.7, 1.0]},
"legend": {"title": "ef_search"},
"margin": {"l": 50, "r": 50, "t": 50, "b": 50},
"width": 600,
"height": 400
},
"data": [
{
"x": latencies_ms,
"y": recalls,
"mode": "lines+markers",
"type": "scatter",
"name": "Recall",
"text": [f"ef={ef}" for ef in ef_values], # Text displayed on hover
"marker": {"color": "#339af0", "size": 8},
"line": {"color": "#339af0", "width": 2}
}
]
}
# Display chart (embedding JSON in markdown)
print("```plotly")
print(json.dumps(chart_data))
print("```")
The plot illustrates how increasing
ef_searchtypically improves recall but also increases query latency. The optimalef_searchdepends on the application's specific requirements for accuracy and speed.
For production use, you'll want to save your tuned index and load it without rebuilding.
# Save the index to disk
index_path = 'my_hnsw_index.bin'
print(f"\nSaving index to {index_path}...")
index.save_index(index_path)
print("Index saved.")
# Load the index (in a new session or script)
# Need to know dim and space_name used during creation
loaded_index = hnswlib.Index(space=space_name, dim=dim)
print(f"\nLoading index from {index_path}...")
loaded_index.load_index(index_path)
print("Index loaded.")
# You can now set ef and query the loaded index
loaded_index.set_ef(ef_search)
labels_loaded, distances_loaded = loaded_index.knn_query(query_data, k=k)
# Verify results are the same
assert np.array_equal(labels, labels_loaded), "Loaded index results differ!"
print("Loaded index query successful and matches original.")
This practical exercise demonstrates the fundamental steps of using HNSW: initialization, building, querying, and the critical process of tuning parameters like M, ef_construction, and especially ef_search to achieve the desired balance between search accuracy and performance efficiency for your vector search application. Remember that optimal parameters are dataset-dependent and require careful evaluation based on your specific use case requirements.
Cleaner syntax. Built-in debugging. Production-ready from day one.
Built for the AI systems behind ApX Machine Learning
Was this section helpful?
© 2026 ApX Machine LearningEngineered with