Let's put theory into practice. Evaluating the performance of your vector search setup is essential for optimizing the trade-off between relevance (recall/precision) and speed (latency/throughput). This hands-on exercise guides you through setting up and executing a comprehensive performance evaluation for an Approximate Nearest Neighbor (ANN) index.
We will simulate evaluating an HNSW index, a common choice covered in Chapter 1. We'll focus on measuring Recall@k and query latency while varying a significant tuning parameter, efSearch
.
Before starting, ensure you have a working Python environment with libraries like numpy
for numerical operations, pandas
for data manipulation, and a library capable of running ANN search (like faiss-cpu
or faiss-gpu
, annoy
, or a client for a vector database). You will also need:
data_vectors
of shape (N, D)
, where N
is the number of vectors and D
is the dimensionality.query_vectors
(shape (Q, D)
) for which you want to find neighbors.query_vectors
. This is typically pre-computed using an exact k-NN search (e.g., brute-force calculation) on the data_vectors
. Assume you have this stored, perhaps as a list of lists or a 2D array ground_truth_indices
, where ground_truth_indices[i]
contains the indices of the actual top K
nearest neighbors for query_vectors[i]
. Let K
be the number of neighbors we care about (e.g., K=10).First, we need to build the HNSW index using our chosen library. Here's a conceptual example using Faiss syntax:
import faiss
import numpy as np
import time
# Assume data_vectors is loaded (N, D)
N, D = data_vectors.shape
# HNSW Parameters
M = 32 # Number of connections per node
efConstruction = 100 # Construction quality/speed trade-off
# Build the HNSW index
index = faiss.IndexHNSWFlat(D, M)
index.hnsw.efConstruction = efConstruction
print("Building index...")
start_time = time.time()
index.add(data_vectors)
build_time = time.time() - start_time
print(f"Index built in {build_time:.2f} seconds.")
# Note: For production, you'd typically save/load the index
# faiss.write_index(index, "my_hnsw_index.faiss")
# index = faiss.read_index("my_hnsw_index.faiss")
This code snippet initializes an HNSW index for vectors of dimension D
with M
connections per layer during construction. efConstruction
influences the quality and time taken for index building.
Now, we'll loop through different values of the search-time parameter efSearch
. This parameter controls the size of the dynamic list maintained during the search phase in HNSW. Higher values generally lead to better recall but increased latency.
We'll measure:
# Assume query_vectors and ground_truth_indices are loaded
# ground_truth_indices should contain indices for the top K neighbors
Q = query_vectors.shape[0]
K = ground_truth_indices.shape[1] # Number of neighbors to evaluate (e.g., 10)
efSearch_values = [16, 32, 64, 128, 256] # Example values to test
results = []
for efSearch in efSearch_values:
print(f"\nEvaluating with efSearch = {efSearch}...")
index.hnsw.efSearch = efSearch # Set the search parameter
all_query_indices = []
start_eval_time = time.time()
# Perform searches for all queries
# We typically search for K neighbors
D_ann, I_ann = index.search(query_vectors, K)
end_eval_time = time.time()
total_eval_time = end_eval_time - start_eval_time
avg_latency_ms = (total_eval_time / Q) * 1000
qps = Q / total_eval_time
# Calculate Recall@K
total_matches = 0
for i in range(Q):
# Ground truth neighbors for query i
true_neighbors = set(ground_truth_indices[i])
# Retrieved neighbors for query i
retrieved_neighbors = set(I_ann[i])
# Count matches
matches = len(true_neighbors.intersection(retrieved_neighbors))
total_matches += matches
# Average recall across all queries
recall_at_K = total_matches / (Q * K)
print(f" Recall@{K}: {recall_at_K:.4f}")
print(f" Avg Latency: {avg_latency_ms:.2f} ms")
print(f" QPS: {qps:.2f}")
results.append({
"efSearch": efSearch,
"Recall@K": recall_at_K,
"Avg Latency (ms)": avg_latency_ms,
"QPS": qps
})
# Store results for analysis
import pandas as pd
results_df = pd.DataFrame(results)
print("\nEvaluation Summary:")
print(results_df)
The formula for Recall@K used here is:
Recall@K=Q1i=1∑QK∣ANN_Resultsi∩GroundTruthi∣Where ANN_Resultsi is the set of K indices returned by the ANN search for query i, and GroundTruthi is the set of K true nearest neighbor indices for query i.
The results_df
DataFrame now holds the performance metrics for each efSearch
value tested. A common way to visualize the trade-off is to plot Recall@K against Average Latency or QPS.
Recall@10 and Average Query Latency plotted against the HNSW
efSearch
parameter. This visualization clearly shows the trade-off: higherefSearch
increases recall but also increases latency.
Alternatively, you can plot Recall@K against QPS:
Recall@10 and Queries Per Second (QPS) plotted against the HNSW
efSearch
parameter. This shows that increasingefSearch
improves recall but decreases the system's throughput.
The charts generated from this evaluation allow you to make informed decisions about parameter tuning.
efSearch
value, accepting the associated increase in latency or decrease in QPS.efSearch
value, trading off some recall for faster responses.This practical exercise provides a template. You can extend it by:
M
and efConstruction
for HNSW; nlist
and nprobe
for IVF).Systematic evaluation is fundamental to building effective and efficient vector search systems. By applying these techniques, you can confidently tune your system to meet the specific performance and relevance requirements of your LLM applications.
© 2025 ApX Machine Learning