Once a user submits a search request, our system needs to understand its semantic meaning, not just the literal words used. This involves transforming the raw query, typically text, but potentially other data types in more advanced systems, into a vector representation that lives in the same high-dimensional space as the vectors representing our indexed documents. This query vector acts as the probe we use to find relevant items within our vector database.
The process is straightforward but requires careful execution, particularly regarding consistency with the indexing stage.
The journey begins when the user types their query into a search bar or submits it via an API endpoint. Let's break down the typical steps:
Receive Raw Query: The system captures the user's input. This might be a simple string like "latest advancements in battery technology" or a more complex natural language question.
Preprocessing (Optional): Depending on the embedding model chosen during the indexing phase (Chapter 1), some basic text preprocessing might be applied. This could include:
However, many modern transformer-based embedding models (like Sentence-BERT variants) are designed to handle raw or minimally processed text effectively. They often capture nuances from capitalization and punctuation. Overly aggressive preprocessing can sometimes strip away useful semantic information. The guiding principle is to preprocess the query in exactly the same way the source documents were preprocessed before their embeddings were generated. If no preprocessing was done on the source documents, none should be done on the query.
Apply the Embedding Model: This is the most significant step. The preprocessed (or raw) query text is fed into the same embedding model that was used to create the vectors for the documents stored in the database. Consistency here is absolutely essential. Using a different model, or even a different version or configuration of the same model, will produce a query vector that resides in a potentially incompatible vector space, leading to meaningless similarity comparisons and poor search results.
from sentence_transformers import SentenceTransformer
# Load the *exact same* model used for indexing the data
# Example: using a common Sentence-BERT model
model_name = 'all-MiniLM-L6-v2'
model = SentenceTransformer(model_name)
# User query
user_query = "What are the best practices for distributed system monitoring?"
# --- Preprocessing Step (Example - apply IF used during indexing) ---
# processed_query = user_query.lower() # Apply same steps as indexing
processed_query = user_query # Assuming minimal preprocessing for this model
# --- Generate the embedding ---
query_vector = model.encode(processed_query)
# query_vector is now a NumPy array representing the query in the vector space
print(f"Query: {user_query}")
print(f"Generated Vector Dimension: {query_vector.shape}")
# Example Output: Generated Vector Dimension: (384,)
# print(f"Vector Snippet: {query_vector[:5]}") # Display first few dimensions
Vector Normalization (Recommended): Many similarity metrics, especially Cosine Similarity (discussed in Chapter 1), are sensitive to vector magnitude. While Sentence-BERT models often produce normalized vectors by default, it's good practice, particularly if unsure or using other models, to explicitly normalize the query vector to unit length (L2 norm). This ensures that the comparison focuses purely on orientation (direction) in the vector space, not magnitude. Normalizing the indexed vectors during ingestion is also standard practice when using Cosine Similarity.
vnormalized=∥v∥2vWhere ∥v∥2 is the Euclidean norm (square root of the sum of squared components) of the vector v.
import numpy as np
# Assuming query_vector is your generated embedding
norm = np.linalg.norm(query_vector)
if norm > 0: # Avoid division by zero
normalized_query_vector = query_vector / norm
else:
normalized_query_vector = query_vector # Handle zero vector case if necessary
# Use normalized_query_vector for the search
# print(f"Normalized Vector Snippet: {normalized_query_vector[:5]}")
The transformation of a raw query into a searchable vector can be visualized as a distinct stage within the overall search pipeline:
This diagram outlines the typical stages involved in converting a user's raw query into a vector suitable for searching the vector database. Consistency with the indexing process, especially in model selection and preprocessing, is vital.
Generating an accurate query vector is fundamental to semantic search. This vector encapsulates the meaning of the user's request within the learned high-dimensional space. When we use this vector to query the database (as detailed in Chapter 3 on ANN search), the database's algorithms efficiently find the stored document vectors that are closest (most similar) to this query vector in that semantic space.
Without this transformation, the database wouldn't know where to "look". The query vector serves as the coordinates for the search operation, guiding it towards potentially relevant results based on semantic proximity rather than simple keyword overlap.
With the query vector prepared, the next step is to pass it to the vector database's search interface, initiating the Approximate Nearest Neighbor search to retrieve the most semantically similar document vectors and their associated metadata. This sets the stage for ranking and presenting the results to the user.
© 2025 ApX Machine Learning