While short-term memory mechanisms handle immediate context, agentic systems often require access to vast amounts of information or recall experiences far exceeding the LLM's context window Lcontext. This necessitates persistent, long-term memory solutions. Unlike simple keyword search, agents frequently need to retrieve information based on conceptual similarity or meaning. This is where vector stores and text embeddings become essential components.
The foundation of modern long-term memory for LLM agents lies in semantic search. Instead of matching exact words, semantic search finds information based on the similarity of meaning. This is achieved by transforming text into numerical representations called embeddings.
An embedding is a dense vector e in a high-dimensional space, generated by an embedding model f, such that e=f(text). These models are trained so that texts with similar meanings are mapped to points that are close to each other in the vector space. For instance, "agent memory systems" and "storing information for autonomous AI" would likely have embeddings that are closer together than "agent memory systems" and "financial market analysis".
Common embedding models include Sentence-BERT (SBERT) variants, OpenAI's Ada embeddings, or Cohere's embedding models. They differ in dimensionality (e.g., 384, 768, 1536, or more dimensions), training objectives, and the nuances of semantic similarity they capture.
The "closeness" between two embeddings, and thus the semantic similarity between the original texts, is typically measured using distance metrics like Euclidean distance or, more commonly, cosine similarity. Cosine similarity measures the cosine of the angle between two vectors, ranging from -1 (opposite meanings) to 1 (identical meanings), with 0 indicating orthogonality or unrelatedness. For two non-zero embedding vectors A and B:
similarity(A,B)=cos(θ)=∥A∥∥B∥A⋅B=∑i=1nAi2∑i=1nBi2∑i=1nAiBiA higher cosine similarity score indicates greater semantic relevance.
A simplified 2D projection of text embeddings. Points clustered together represent semantically similar documents. A query vector (star) retrieves the nearest document vectors (blue cluster).
Storing and efficiently querying millions or billions of high-dimensional vectors requires specialized databases known as vector stores or vector databases. Examples include managed services like Pinecone and self-hosted options like Chroma, Milvus, Weaviate, or libraries like FAISS (Facebook AI Similarity Search).
These databases are optimized for Approximate Nearest Neighbor (ANN) search. Given a query vector equery, the goal is to find the k vectors in the database that are closest to equery according to a chosen distance metric (like cosine similarity or Euclidean distance).
Indexing and Querying Workflow:
Workflow for retrieving information from a vector store within an agentic system. The offline indexing process involves embedding and storing text chunks. During runtime, the agent formulates a query based on its task, embeds it, searches the vector database using ANN, retrieves relevant chunks, and uses them to augment its context for the core LLM.
Vector databases abstract away the complexities of managing ANN indexes and provide APIs for easy insertion, deletion, and querying of vector data. They are the cornerstone for providing LLMs with scalable, searchable long-term memory. The selection of a specific vector database often depends on factors like scalability requirements, hosting preferences (cloud vs. on-premise), desired consistency guarantees, and support for metadata filtering during search. The subsequent section on "Advanced Retrieval Strategies" will explore techniques to enhance the quality of information retrieved from these systems.
© 2025 ApX Machine Learning