When working with documents broken into manageable chunks, a primary challenge is efficiently finding the chunks most relevant to a user's question. A simple keyword search proves insufficient, as it often fails to capture synonyms, context, and the underlying meaning of a query. This necessitates a method for semantic search, which understands the intent behind the words. Semantic search is accomplished by converting text chunks into a numerical format called an embedding, allowing their meanings to be compared mathematically.
An embedding is a dense vector, which is essentially a list of numbers, that represents a piece of text in a multi-dimensional space. These vectors are generated by specialized models trained to capture the semantic properties of language. The defining characteristic of a well-constructed embedding space is that texts with similar meanings will have vectors that are close to one another.
For example, the vectors for "cat" and "kitten" would be much closer to each other than the vectors for "cat" and "airplane". This property allows us to find relevant documents by searching for vectors in this space that are nearest to our query's vector.
Semantically related items cluster together in the embedding space, while unrelated items are farther apart.
LangChain provides a standard interface for many embedding model providers. To generate an embedding, you instantiate a class for your chosen provider, such as OpenAIEmbeddings or HuggingFaceEmbeddings, and use its embed_query method.
from langchain_openai import OpenAIEmbeddings
# Note: Requires an OPENAI_API_KEY environment variable
embeddings_model = OpenAIEmbeddings()
query_embedding = embeddings_model.embed_query(
"How is a vector store used in a RAG system?"
)
# The result is a list of floating-point numbers
print(f"Vector dimension: {len(query_embedding)}")
print(f"First 5 elements: {query_embedding[:5]}")
Vector dimension: 1536
First 5 elements: [-0.0123..., 0.0045..., -0.0218..., -0.0076..., 0.0091...]
Once we can convert text to vectors, we need a way to measure the "closeness" between them. The most common metric for this is cosine similarity, which measures the cosine of the angle between two vectors. A cosine similarity of 1 means the vectors point in the exact same direction (highly similar), 0 means they are orthogonal (unrelated), and -1 means they are opposite (dissimilar). The formula is given by:
While we could manually calculate the similarity between a query vector and every single document chunk vector, this approach becomes computationally expensive and slow as the number of documents grows. This is where a vector store comes in.
A vector store is a specialized database optimized for one primary task: performing extremely fast similarity searches on large collections of vectors. It accomplishes this using algorithms for Approximate Nearest Neighbor (ANN) search, which find the "closest" vectors without needing to compare the query to every single entry.
A vector store performs two main jobs in a RAG pipeline:
LangChain integrates with dozens of vector stores, from lightweight, in-memory libraries like Chroma and FAISS to production-grade, standalone databases like Pinecone and Weaviate. For development, an in-memory option like Chroma is an excellent starting point.
The following example demonstrates the complete workflow: taking document chunks, embedding them, and indexing them in a Chroma vector store. LangChain's vector store integrations often provide a convenient from_documents class method that handles these steps for you.
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
# Example document chunks from the previous step
documents = [
Document(page_content="A vector store is a database designed to store and retrieve vector embeddings efficiently."),
Document(page_content="LangChain provides integrations with many vector stores, including Chroma, FAISS, and Pinecone."),
Document(page_content="Embeddings are numerical representations of text that capture semantic meaning."),
Document(page_content="The most common similarity metric used in vector search is cosine similarity.")
]
# Initialize the embeddings model
embeddings = OpenAIEmbeddings()
# Create a Chroma vector store from the documents
# This will embed and index the documents automatically
vectorstore = Chroma.from_documents(documents, embeddings)
# Now, we can perform a similarity search
query = "What is an embedding?"
retrieved_docs = vectorstore.similarity_search(query)
# The result is a list of Document objects, ranked by relevance
print(retrieved_docs[0].page_content)
Embeddings are numerical representations of text that capture semantic meaning.
The diagram below illustrates the full data flow for both ingesting documents into a vector store and retrieving them based on a query.
The ingestion pipeline processes and stores documents, while the retrieval pipeline uses the stored index to find relevant information for a user's query.
By creating and populating a vector store, we have transformed our collection of static documents into a dynamic, searchable knowledge base. The next section will show how to wrap this search functionality into a Retriever, a standard LangChain component used for fetching data within larger application chains.
Cleaner syntax. Built-in debugging. Production-ready from day one.
Built for the AI systems behind ApX Machine Learning
Was this section helpful?
© 2026 ApX Machine LearningEngineered with