Theory provides the foundation, but practical implementation solidifies understanding. This section guides you through building and integrating a persistent long-term memory module for an agent using a vector database. We'll move beyond the conceptual understanding of vector stores discussed earlier and implement the core mechanisms for storing and retrieving information based on semantic similarity. This hands-on exercise is fundamental for creating agents capable of maintaining context, learning from past interactions, and accessing relevant knowledge over extended periods.
We will use ChromaDB, an open-source embedding database, for this practical example due to its ease of setup for local development. However, the principles demonstrated here are broadly applicable to other vector databases like Pinecone, Weaviate, or FAISS with appropriate API changes.
Before starting, ensure you have the necessary libraries installed. You'll need an LLM library (like langchain
or llama-index
, although we'll focus on the core logic here), the vector database client, and a sentence transformer library for generating embeddings.
pip install chromadb sentence-transformers
# Optional: Install langchain or llama-index if integrating into their framework
# pip install langchain openai # Example using LangChain with OpenAI
You will also need access to an embedding model. We'll use a model from the sentence-transformers
library, which can be downloaded automatically.
The core components of our vector database memory system are the embedding function and the vector store itself.
First, we need a way to convert text into dense vector representations (embeddings). The sentence-transformers
library provides easy access to various pre-trained models.
from sentence_transformers import SentenceTransformer
# Load a pre-trained embedding model
# Models like 'all-MiniLM-L6-v2' are efficient; 'all-mpnet-base-v2' offers a good balance.
# Choose a model appropriate for your performance needs and resource constraints.
embedding_model_name = 'all-MiniLM-L6-v2'
embedding_function = SentenceTransformer(embedding_model_name)
# Example: Get the embedding for a piece of text
text_example = "This is a sample sentence for embedding."
vector_example = embedding_function.encode(text_example)
print(f"Embedding dimension: {len(vector_example)}")
# Output: Embedding dimension: 384 (for all-MiniLM-L6-v2)
This embedding_function
will be used whenever we need to add text to our memory or query it.
Now, let's set up ChromaDB. We'll configure it for local persistence, meaning the data will be saved to disk.
import chromadb
import uuid # For generating unique IDs
# Set up a persistent ChromaDB client
# Data will be stored in the 'agent_memory_db' directory
client = chromadb.PersistentClient(path="./agent_memory_db")
# Create or get a collection (like a table in a relational database)
# We associate our embedding function implicitly via model name or explicitly later
# For simplicity here, we'll handle embedding outside Chroma's direct integration,
# but Chroma can manage embeddings directly too.
collection_name = "agent_long_term_memory"
try:
collection = client.get_collection(name=collection_name)
print(f"Collection '{collection_name}' loaded.")
except Exception: # Replace with more specific exception handling in production
print(f"Creating collection '{collection_name}'...")
# Note: ChromaDB can also accept an embedding_function directly during creation
collection = client.create_collection(name=collection_name)
print(f"Collection '{collection_name}' created.")
We now have a collection
object representing our agent's long-term memory store.
An agent needs to perform two primary operations on its memory: writing (adding information) and reading (retrieving information).
When the agent encounters new information, completes a task, or generates a significant thought, this should be stored. We need to embed the text and add it to the collection along with a unique ID and potentially useful metadata.
def add_memory(text_content: str, metadata: dict = None):
"""
Adds a piece of text content to the vector store memory.
Args:
text_content: The string content to store.
metadata: Optional dictionary containing metadata (e.g., timestamp, source).
"""
if not text_content:
print("Warning: Attempted to add empty content to memory.")
return
# Generate a unique ID for the memory entry
memory_id = str(uuid.uuid4())
# Generate the embedding for the text content
embedding = embedding_function.encode(text_content).tolist() # Ensure it's a list
# Prepare metadata - ensure it's serializable and ChromaDB compatible
final_metadata = metadata if metadata else {}
# Example: Add a timestamp automatically if not provided
if 'timestamp' not in final_metadata:
import datetime
final_metadata['timestamp'] = datetime.datetime.utcnow().isoformat()
# Add the memory to the collection
try:
collection.add(
embeddings=[embedding],
documents=[text_content],
metadatas=[final_metadata],
ids=[memory_id]
)
print(f"Added memory: ID={memory_id}, Content='{text_content[:50]}...'")
except Exception as e:
print(f"Error adding memory: {e}")
# --- Example Usage ---
add_memory("The user asked about the weather in London.")
add_memory("Agent determined the weather is partly cloudy, 15°C.", metadata={"source": "weather_tool_api"})
add_memory("Plan: 1. Check user request. 2. Query weather API. 3. Format response.")
When the agent needs to recall relevant information (e.g., to answer a question, inform its planning, or maintain context), it queries the vector store. The query itself is embedded, and the database returns the most semantically similar entries.
def retrieve_memories(query_text: str, n_results: int = 3, filter_metadata: dict = None):
"""
Retrieves relevant memories from the vector store based on semantic similarity.
Args:
query_text: The text query to search for.
n_results: The maximum number of relevant memories to return.
filter_metadata: Optional dictionary to filter memories based on metadata.
Returns:
A list of retrieved documents (memories).
"""
if not query_text:
print("Warning: Attempted to retrieve memories with an empty query.")
return []
# Generate the embedding for the query
query_embedding = embedding_function.encode(query_text).tolist()
# Prepare the where filter if metadata filtering is requested
where_filter = filter_metadata if filter_metadata else {}
try:
results = collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
where=where_filter, # Apply metadata filter here
include=['documents', 'metadatas', 'distances'] # Request documents, metadata, and similarity distances
)
# Extract and return the relevant information
retrieved_docs = []
if results and results.get('documents') and results.get('documents')[0]:
print(f"Retrieved {len(results['documents'][0])} memories for query '{query_text[:50]}...'")
# Combine documents with their metadata for context
for doc, meta, dist in zip(results['documents'][0], results['metadatas'][0], results['distances'][0]):
retrieved_docs.append({
"content": doc,
"metadata": meta,
"similarity_score": 1 - dist # Convert distance to similarity score (closer to 1 is better)
})
# Sort by similarity score (descending) if needed, though Chroma often returns sorted
retrieved_docs.sort(key=lambda x: x['similarity_score'], reverse=True)
return retrieved_docs
else:
print("No relevant memories found.")
return []
except Exception as e:
print(f"Error retrieving memories: {e}")
return []
# --- Example Usage ---
print("\n--- Retrieving relevant memories ---")
query = "What was the weather inquiry about?"
relevant_memories = retrieve_memories(query, n_results=2)
if relevant_memories:
print(f"\nMost relevant memories for query: '{query}'")
for i, mem in enumerate(relevant_memories):
print(f"{i+1}. Score: {mem['similarity_score']:.4f} | Content: {mem['content']} | Meta: {mem['metadata']}")
# Example with metadata filter
print("\n--- Retrieving memories specifically from the weather tool ---")
tool_memories = retrieve_memories("weather information", n_results=1, filter_metadata={"source": "weather_tool_api"})
if tool_memories:
print(f"\nMemories from 'weather_tool_api':")
for mem in tool_memories:
print(f"- Score: {mem['similarity_score']:.4f} | Content: {mem['content']} | Meta: {mem['metadata']}")
Now, let's consider how these add_memory
and retrieve_memories
functions fit into a simplified agent's operational cycle. Imagine a basic agent that takes input, thinks (retrieves memory, plans), acts, and observes results.
# --- Simplified Agent Simulation ---
def agent_step(user_input: str = None, previous_observation: str = None):
"""Simulates a single step of an agent using memory."""
print("\n--- Agent Step ---")
# 1. Gather Context (Input, Past Observations)
context = ""
if user_input:
context += f"User Input: {user_input}\n"
add_memory(f"Received user input: {user_input}", {"type": "user_interaction"}) # Store interaction
if previous_observation:
context += f"Previous Observation: {previous_observation}\n"
# Decide if the observation is worth storing long-term
if len(previous_observation) > 10: # Simple heuristic
add_memory(f"Observation: {previous_observation}", {"type": "agent_observation"})
# 2. Retrieve Relevant Memories
query_for_memory = f"Current context: {context} What should I recall?"
# More sophisticated query generation is needed in practice
retrieved = retrieve_memories(query_for_memory, n_results=3)
memory_context = "\nRelevant Past Information:\n"
if retrieved:
for mem in retrieved:
memory_context += f"- {mem['content']} (Timestamp: {mem['metadata'].get('timestamp', 'N/A')})\n"
else:
memory_context += "- None found.\n"
# 3. Thinking/Planning (Simplified: just print retrieved context)
# In a real agent, this context would feed into the LLM prompt for reasoning/planning
print(f"Context provided for LLM (Input + Retrieved Memory):\n{context}{memory_context}")
simulated_llm_prompt = f"{context}{memory_context} \nGiven this, what is the next action?"
print(f"Simulated Prompt Snippet:\n{simulated_llm_prompt[:200]}...") # Show part of potential prompt
# 4. Action (Simplified: Placeholder action)
action = "Simulated Action: Query Database for 'London Population'"
print(f"Agent Action: {action}")
# Execute action... (omitted)
# 5. Observation (Simulated result of action)
observation = "Database Result: London population is approximately 9 million."
print(f"Agent Observation: {observation}")
# Return observation for the next step
return observation
# --- Run a few steps ---
observation = None
observation = agent_step(user_input="Hi agent, tell me about London.", previous_observation=observation)
observation = agent_step(previous_observation=observation) # Agent continues based on previous observation
observation = agent_step(user_input="What did I ask about initially?", previous_observation=observation)
This simulation demonstrates the basic flow: inputs and observations are potentially added to memory, and relevant memories are retrieved to enrich the context for the agent's next decision or response.
The interaction between the agent, embedding model, and vector database can be visualized as follows:
Data flow for adding and retrieving memories using a vector database. The agent uses an embedding model to convert text to vectors for storage and querying.
This practical setup provides a functional long-term memory, but several areas can be refined for production systems:
This hands-on exercise equips you with the core skills to implement vector database memory for your agents. By mastering these techniques, you can build agents that exhibit greater coherence, learn from their experiences, and leverage vast knowledge repositories effectively. Experiment with different data types, metadata structures, and retrieval parameters to tailor the memory system to your agent's specific requirements.
© 2025 ApX Machine Learning