Having explored the theoretical underpinnings of HNSW, including its layered graph structure and greedy search heuristic, it's time to translate that theory into practice. This section guides you through implementing an HNSW index using a common Python library, building an index, performing searches, and tuning its parameters to balance performance and accuracy for your specific needs. We assume you have a working Python environment and are familiar with NumPy for numerical operations.
First, ensure you have the necessary library installed. We'll use hnswlib
, a popular and efficient implementation of HNSW. You'll also need numpy
for creating vector data.
pip install hnswlib numpy
We'll work with synthetic data for this example. Let's generate some random 128-dimensional vectors, simulating typical embedding outputs from language models.
import hnswlib
import numpy as np
import time
# Define data parameters
dim = 128 # Dimensionality of the vectors
num_elements = 10000 # Number of vectors in our dataset
# Generate random data (replace with your actual embeddings)
np.random.seed(42) # for reproducibility
data = np.float32(np.random.random((num_elements, dim)))
# Generate unique IDs for each vector
data_labels = np.arange(num_elements)
# Generate a few query vectors
num_queries = 5
query_data = np.float32(np.random.random((num_queries, dim)))
Initializing and populating an HNSW index involves specifying the space (distance metric), dimensionality, and then adding the data points.
# Initialize the HNSW index
# Possible space options: 'l2', 'ip' (inner product), 'cosine'
space_name = 'l2' # Euclidean distance
index = hnswlib.Index(space=space_name, dim=dim)
# Set index parameters BEFORE adding data
# M: max number of connections per node per layer (default 16)
# ef_construction: size of dynamic list for neighbors during construction (default 200)
# Higher ef_construction leads to better index quality but slower build time
M = 16
ef_construction = 200
index.init_index(max_elements=num_elements, ef_construction=ef_construction, M=M)
# Add data to the index
# ids should be unique integers
print("Adding data to the index...")
start_time = time.time()
index.add_items(data, data_labels)
build_time = time.time() - start_time
print(f"Index build completed in {build_time:.2f} seconds.")
# Optional: Control the number of threads used for indexing
# index.set_num_threads(4) # Use 4 threads
Here, init_index
prepares the index structure. max_elements
should be at least the number of elements you intend to add. M
controls the density of the graph connections, affecting memory usage and search quality. ef_construction
influences the thoroughness of the search for neighbours during the build process; higher values generally lead to better recall but increase indexing time.
Once the index is built, you can perform searches using the knn_query
method.
# Set search-time parameter ef_search
# ef_search: size of dynamic list for neighbors during search
# Higher ef_search leads to better recall but slower search time
# Must be >= k (number of neighbors requested)
ef_search = 50
k = 10 # Number of nearest neighbors to retrieve
print(f"\nPerforming k-NN search for {num_queries} queries (k={k}, ef_search={ef_search})...")
index.set_ef(ef_search) # Set ef_search for the query session
start_time = time.time()
labels, distances = index.knn_query(query_data, k=k)
search_time = time.time() - start_time
print(f"Search completed in {search_time:.4f} seconds.")
print(f"Average query time: {search_time / num_queries:.4f} seconds.")
# Display results for the first query
print("\nResults for the first query:")
print("Labels:", labels[0])
print("Distances:", distances[0])
The knn_query
method returns two arrays: labels
containing the integer IDs of the approximate nearest neighbors, and distances
containing the corresponding calculated distances (e.g., L2 distance in our case). The critical parameter here is ef_search
(set via set_ef
). It controls the size of the candidate list during the search traversal. Increasing ef_search
makes the search more exhaustive, potentially improving accuracy (recall) at the cost of higher latency. It must be at least k
.
The core challenge in using ANN algorithms like HNSW is balancing accuracy (recall) and performance (query latency, build time, memory usage). The primary tuning knobs are:
M
(Build time): Defines the maximum number of outgoing connections for nodes in the graph layers (except layer 0). Higher M
creates denser graphs, potentially improving recall and allowing for better navigation, but significantly increases memory footprint and build time. Typical values range from 8 to 64.ef_construction
(Build time): Controls the quality of the index construction. Higher values mean a more thorough search for neighbours when inserting elements, leading to a potentially better-structured graph (higher recall) but longer build times. Typical values range from 100 to 2000 or more, depending on the dataset and desired quality.ef_search
(Search time): Controls the trade-off between search speed and recall at query time. It determines the size of the priority queue used during graph traversal. Higher values increase the chance of finding the true nearest neighbors but slow down the query. This is often the most frequently tuned parameter after the index is built. It must be at least k
.A Practical Tuning Strategy:
# Example: Finding exact neighbors (computationally expensive!)
# from sklearn.neighbors import NearestNeighbors
# print("Calculating ground truth (this might take a while)...")
# nbrs = NearestNeighbors(n_neighbors=k, algorithm='brute', metric='euclidean').fit(data)
# true_distances, true_indices = nbrs.kneighbors(query_data)
M
and ef_construction
(start with defaults like M=16, ef_construction=200). Build the index.ef_search
(e.g., from k up to a few hundred). For each value, measure Recall@k and average query latency.ef_search
, or if the latency is too high, rebuild the index with increased ef_construction
or M
. Be mindful that increasing these parameters increases build time and potentially index size (especially M
).ef_search
sweep and evaluation.Example Visualization:
Let's simulate results for different ef_search
values and plot the trade-off. Assume we have calculated recall against some ground truth.
# Simulated tuning results (replace with actual measurements)
ef_values = [10, 20, 50, 100, 200, 400]
recalls = [0.75, 0.85, 0.92, 0.96, 0.98, 0.99] # Example recall values
latencies_ms = [0.5, 0.8, 1.5, 2.8, 5.0, 9.5] # Example latencies in milliseconds
import json
chart_data = {
"layout": {
"title": "HNSW Tuning: Recall@10 vs. Query Latency",
"xaxis": {"title": "Average Query Latency (ms)"},
"yaxis": {"title": "Recall@10", "range": [0.7, 1.0]},
"legend": {"title": "ef_search"},
"margin": {"l": 50, "r": 50, "t": 50, "b": 50},
"width": 600,
"height": 400
},
"data": [
{
"x": latencies_ms,
"y": recalls,
"mode": "lines+markers",
"type": "scatter",
"name": "Recall",
"text": [f"ef={ef}" for ef in ef_values], # Text displayed on hover
"marker": {"color": "#339af0", "size": 8},
"line": {"color": "#339af0", "width": 2}
}
]
}
# Display chart (embedding JSON in markdown)
print("```plotly")
print(json.dumps(chart_data))
print("```")
The plot illustrates how increasing
ef_search
typically improves recall but also increases query latency. The optimalef_search
depends on the application's specific requirements for accuracy and speed.
For production use, you'll want to save your tuned index and load it without rebuilding.
# Save the index to disk
index_path = 'my_hnsw_index.bin'
print(f"\nSaving index to {index_path}...")
index.save_index(index_path)
print("Index saved.")
# Load the index (in a new session or script)
# Need to know dim and space_name used during creation
loaded_index = hnswlib.Index(space=space_name, dim=dim)
print(f"\nLoading index from {index_path}...")
loaded_index.load_index(index_path)
print("Index loaded.")
# You can now set ef and query the loaded index
loaded_index.set_ef(ef_search)
labels_loaded, distances_loaded = loaded_index.knn_query(query_data, k=k)
# Verify results are the same
assert np.array_equal(labels, labels_loaded), "Loaded index results differ!"
print("Loaded index query successful and matches original.")
This practical exercise demonstrates the fundamental steps of using HNSW: initialization, building, querying, and the critical process of tuning parameters like M
, ef_construction
, and especially ef_search
to achieve the desired balance between search accuracy and performance efficiency for your vector search application. Remember that optimal parameters are dataset-dependent and require careful evaluation based on your specific use case requirements.
© 2025 ApX Machine Learning