Alright, let's put theory into practice. You've learned about the different ANN algorithms and the parameters that govern their behavior. Now, we'll get our hands dirty and see firsthand how adjusting these parameters impacts search performance. The goal isn't just to run code, but to build intuition about the trade-offs between indexing time, search speed, and accuracy (recall).
We'll primarily focus on HNSW, as it's a widely used and effective algorithm available in many vector databases. We'll use the qdrant-client
library, which allows us to run a Qdrant instance entirely in memory for easy experimentation, and provides fine-grained control over HNSW parameters.
First, ensure you have the necessary libraries installed. You'll need qdrant-client
for the vector database interactions, numpy
for numerical operations, sentence-transformers
to generate some example embeddings (or you can use pre-computed ones), time
for measuring latency, and scikit-learn
to calculate recall against a ground truth.
pip install qdrant-client numpy sentence-transformers scikit-learn
Next, let's prepare some data and set up our Qdrant client running in memory. We'll generate a small set of random vectors for simplicity, but you could easily substitute these with embeddings generated from real text data using a model like all-MiniLM-L6-v2
.
import numpy as np
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer # Or use your own embeddings
import time
from sklearn.neighbors import NearestNeighbors
# Configuration
NUM_VECTORS = 1000
DIMENSION = 384 # Example dimension for all-MiniLM-L6-v2
COLLECTION_NAME = "hnsw_experiment_collection"
DISTANCE_METRIC = models.Distance.COSINE
# Generate sample data (replace with real embeddings if desired)
print(f"Generating {NUM_VECTORS} random vectors of dimension {DIMENSION}...")
# Use a fixed seed for reproducibility
np.random.seed(42)
data_vectors = np.random.rand(NUM_VECTORS, DIMENSION).astype(np.float32)
query_vector = np.random.rand(1, DIMENSION).astype(np.float32)
# --- Optional: Use Sentence Transformers for more realistic data ---
# model = SentenceTransformer('all-MiniLM-L6-v2')
# sample_texts = ["This is sentence one.", "Here is another sentence.", "Example text for vector search."] # Add more texts
# data_vectors = model.encode(sample_texts)
# query_text = "A query about sentences."
# query_vector = model.encode([query_text])
# DIMENSION = data_vectors.shape[1]
# NUM_VECTORS = data_vectors.shape[0]
# ------------------------------------------------------------------
print("Sample data generated.")
print(f"Data vectors shape: {data_vectors.shape}")
print(f"Query vector shape: {query_vector.shape}")
# Initialize Qdrant client in memory
client = QdrantClient(":memory:")
print("Qdrant client initialized in memory.")
# Calculate ground truth (exact nearest neighbors) using scikit-learn
print("Calculating ground truth (exact KNN)...")
nn_exact = NearestNeighbors(n_neighbors=10, metric='cosine', algorithm='brute')
nn_exact.fit(data_vectors)
ground_truth_indices = nn_exact.kneighbors(query_vector, return_distance=False)[0]
ground_truth_set = set(ground_truth_indices)
print("Ground truth calculated.")
# Helper function to calculate recall@10
def calculate_recall(retrieved_indices, ground_truth):
retrieved_set = set(idx for idx in retrieved_indices)
true_positives = len(retrieved_set.intersection(ground_truth))
recall = true_positives / len(ground_truth) if len(ground_truth) > 0 else 0
return recall
This setup creates NUM_VECTORS
random vectors, defines a query_vector
, initializes an in-memory Qdrant client, and importantly, calculates the ground_truth_indices
using scikit-learn's exact nearest neighbor search. This ground truth will be our reference for calculating recall.
HNSW has several important parameters. We'll focus on three:
m
: The maximum number of connections per node on each layer of the graph. Higher m
generally leads to better recall but increases index size and indexing time.ef_construct
: The size of the dynamic list used during index construction. A larger ef_construct
considers more potential neighbors while building the graph, leading to a potentially higher quality index (better recall) at the cost of longer indexing time.ef_search
(often just ef
at query time in Qdrant's API, distinct from ef_construct
): The size of the dynamic list used during search. A larger ef_search
explores more potential neighbors at query time, generally improving recall but increasing query latency.Let's design a function to create a collection, index data, and perform searches, allowing us to vary these parameters.
def run_hnsw_experiment(m_value, ef_construct_value, ef_search_value, k=10):
"""
Creates a Qdrant collection with specific HNSW parameters, indexes data,
performs a search, and returns metrics.
"""
# Delete collection if it exists from previous runs
try:
client.delete_collection(collection_name=COLLECTION_NAME)
# print(f"Collection '{COLLECTION_NAME}' deleted.")
except Exception:
# print(f"Collection '{COLLECTION_NAME}' does not exist, skipping deletion.")
pass
time.sleep(0.1) # Short pause to ensure deletion completes
# Create collection with specified HNSW parameters
start_index_time = time.time()
client.create_collection(
collection_name=COLLECTION_NAME,
vectors_config=models.VectorParams(
size=DIMENSION,
distance=DISTANCE_METRIC
),
hnsw_config=models.HnswConfigDiff(
m=m_value,
ef_construct=ef_construct_value
)
)
# Index the data (using IDs 0 to NUM_VECTORS-1)
client.upsert(
collection_name=COLLECTION_NAME,
points=models.Batch(
ids=list(range(NUM_VECTORS)),
vectors=data_vectors.tolist()
),
wait=True # Wait for indexing to complete
)
index_time = time.time() - start_index_time
# Perform the search
start_search_time = time.time()
search_result = client.search(
collection_name=COLLECTION_NAME,
query_vector=query_vector[0].tolist(),
search_params=models.SearchParams(
hnsw_ef=ef_search_value, # This is the search-time ef parameter
exact=False # Ensure we use the ANN index
),
limit=k # Retrieve top k results
)
search_latency = time.time() - start_search_time
# Calculate recall
retrieved_ids = [hit.id for hit in search_result]
recall = calculate_recall(retrieved_ids, ground_truth_set)
# print(f"Params: m={m_value}, ef_construct={ef_construct_value}, ef_search={ef_search_value}")
# print(f" Index Time: {index_time:.4f}s")
# print(f" Search Latency: {search_latency:.4f}s")
# print(f" Recall@{k}: {recall:.4f}")
# print("-" * 20)
return {
"m": m_value,
"ef_construct": ef_construct_value,
"ef_search": ef_search_value,
"index_time": index_time,
"search_latency": search_latency,
"recall": recall
}
Now we can run experiments by calling this function with different parameter combinations.
ef_search
Let's keep m
and ef_construct
constant and see how changing ef_search
affects recall and latency. A higher ef_search
should generally increase both.
results_ef_search = []
m_fixed = 16
ef_construct_fixed = 100
ef_search_values = [10, 20, 50, 100, 200] # Try different search complexities
print(f"\n--- Experiment: Varying ef_search (m={m_fixed}, ef_construct={ef_construct_fixed}) ---")
for ef_s in ef_search_values:
result = run_hnsw_experiment(m_fixed, ef_construct_fixed, ef_s)
results_ef_search.append(result)
print(f"ef_search={ef_s}: Recall={result['recall']:.3f}, Latency={result['search_latency']:.4f}s")
# Prepare data for plotting
latencies = [r['search_latency'] for r in results_ef_search]
recalls = [r['recall'] for r in results_ef_search]
ef_s_labels = [str(r['ef_search']) for r in results_ef_search]
We can visualize this trade-off:
Recall improves significantly as
ef_search
increases, but search latency also rises. The optimal value depends on whether your application prioritizes higher accuracy or lower latency. Note: Exact values depend heavily on data, hardware, and library implementation.
ef_construct
Now, let's fix m
and ef_search
and see how ef_construct
impacts the index build time and the resulting search quality. Higher ef_construct
should increase index time but potentially allow for better recall even with a lower ef_search
.
results_ef_construct = []
m_fixed = 16
ef_search_fixed = 50 # Keep search effort moderate
ef_construct_values = [50, 100, 200, 400] # Try different construction complexities
print(f"\n--- Experiment: Varying ef_construct (m={m_fixed}, ef_search={ef_search_fixed}) ---")
for ef_c in ef_construct_values:
result = run_hnsw_experiment(m_fixed, ef_c, ef_search_fixed)
results_ef_construct.append(result)
print(f"ef_construct={ef_c}: Index Time={result['index_time']:.3f}s, Recall={result['recall']:.3f}, Latency={result['search_latency']:.4f}s")
# Prepare data for plotting
index_times_efc = [r['index_time'] for r in results_ef_construct]
recalls_efc = [r['recall'] for r in results_ef_construct]
ef_c_labels = [str(r['ef_construct']) for r in results_ef_construct]
Increasing
ef_construct
significantly increases the time needed to build the index. While it can improve recall for a fixedef_search
, the gains might diminish after a certain point. A higheref_construct
builds a potentially "better" graph, which might allow achieving good recall with a slightly loweref_search
later.
m
Finally, let's vary m
while keeping the ef
parameters fixed. Remember, m
controls the graph's connectivity.
results_m = []
ef_construct_fixed = 100
ef_search_fixed = 50
m_values = [8, 16, 32, 64] # Try different connectivity levels
print(f"\n--- Experiment: Varying m (ef_construct={ef_construct_fixed}, ef_search={ef_search_fixed}) ---")
for m_val in m_values:
result = run_hnsw_experiment(m_val, ef_construct_fixed, ef_search_fixed)
results_m.append(result)
print(f"m={m_val}: Index Time={result['index_time']:.3f}s, Recall={result['recall']:.3f}, Latency={result['search_latency']:.4f}s")
# Prepare data for plotting
index_times_m = [r['index_time'] for r in results_m]
recalls_m = [r['recall'] for r in results_m]
m_labels = [str(r['m']) for r in results_m]
Increasing
m
also increases index build time (and index size, though not measured here) while generally improving recall. Similar toef_construct
, the impact on recall might show diminishing returns. Common values form
are often in the range of 16 to 64.
This hands-on exercise demonstrates the tangible effects of tuning ANN index parameters. You observed how:
ef_search
directly trades higher search latency for better recall.ef_construct
and m
invests more time (and potentially memory) during indexing to build a higher-quality graph structure, which can lead to better recall or allow for lower ef_search
values during queries.The optimal parameters are not universal. They depend heavily on:
Real-world tuning often involves more systematic experimentation, potentially using larger datasets and automated evaluation frameworks to explore the parameter space and find the best balance for your specific needs. This practical experience, however, provides a foundational understanding of how these parameters influence the system's behavior, empowering you to make informed decisions when building your own semantic search applications.
© 2025 ApX Machine Learning