As discussed earlier in this chapter, managing conversation history effectively becomes challenging as interactions lengthen. Basic buffer memory eventually overflows the context window, and summarizing memory can lose important details. Vector Store Memory offers a compelling alternative by storing past interactions as embeddings in a vector database and retrieving the most relevant ones semantically when generating a new response. This approach allows the model to recall pertinent information from potentially very long histories, even if it wasn't mentioned recently.
In this practical section, we will implement VectorStoreRetrieverMemory
using FAISS, a popular library for efficient similarity search, along with OpenAI's embedding models.
First, ensure you have the necessary libraries installed. We'll need langchain
, the specific integration (langchain-openai
), a vector store implementation (faiss-cpu
or faiss-gpu
), and tiktoken
for text processing.
pip install langchain langchain-openai faiss-cpu tiktoken
You will also need an OpenAI API key configured in your environment, typically as OPENAI_API_KEY
.
Now, let's import the required components:
import os
from langchain_openai import OpenAIEmbeddings, OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
# Ensure your OPENAI_API_KEY is set in your environment variables
# Example: os.environ["OPENAI_API_KEY"] = "your_api_key_here"
# Check if the API key is available
if not os.getenv("OPENAI_API_KEY"):
raise ValueError("OPENAI_API_KEY environment variable not set.")
The core idea is to use a vector store to hold the conversation history. Each turn of the conversation (input and output) will be embedded and stored. When generating the next response, we'll use the current input to query the vector store for relevant past exchanges.
Initialize Components: We need an embedding model and an empty FAISS vector store. The vector store requires the embeddings model and an index
name (just a label here).
# 1. Initialize the embedding model
embedding_model = OpenAIEmbeddings()
# 2. Initialize an empty FAISS vector store
# The dimensionality depends on the embedding model (OpenAIEmbeddings uses 1536)
embedding_size = 1536
index = FAISS.from_texts(["_initial_"], embedding_model, metadatas=[{"hnsw:space": "ip"}]) # Use inner product space for OpenAI embeddings
Note: We initialize FAISS with a dummy text _initial_
because it cannot be initialized completely empty via from_texts
. This initial entry won't significantly impact retrieval. We specify "hnsw:space": "ip"
(inner product) in metadata, which is often recommended for OpenAI embeddings, although cosine similarity is the default and also works well.
Create the Retriever: The memory module doesn't interact with the vector store directly; it uses a LangChain Retriever
. We create a retriever from our FAISS index. The search_kwargs={'k': 2}
parameter tells the retriever to fetch the top 2 most relevant documents (conversation snippets) based on semantic similarity to the current input.
# 3. Create the retriever
# We'll retrieve the top 2 most relevant conversation snippets
retriever = index.as_retriever(search_kwargs=dict(k=2))
Choosing the right value for k
is important. A larger k
brings more context but increases token usage and the risk of including irrelevant information. A smaller k
is more concise but might miss useful context. Experimentation is often required.
Instantiate VectorStoreRetrieverMemory: Now, we create the memory object itself, passing in the retriever.
# 4. Instantiate the memory module
memory = VectorStoreRetrieverMemory(retriever=retriever, memory_key="history")
The memory_key="history"
specifies the variable name that will hold the retrieved context within the prompt.
Let's integrate this memory into a standard ConversationChain
. We need an LLM and a prompt template that includes the history
variable (managed by our memory module) and the input
variable (the user's current message).
# 5. Initialize the LLM
llm = OpenAI(temperature=0) # Use a deterministic setting for predictability
# 6. Define the Prompt Template
# Note the "{history}" variable, which will be populated by VectorStoreRetrieverMemory
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
Relevant pieces of previous conversation:
{history}
(You do not need to use these pieces of information if not relevant)
Current conversation:
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input"], template=_DEFAULT_TEMPLATE
)
# 7. Create the ConversationChain
conversation_with_vectorstore_memory = ConversationChain(
llm=llm,
prompt=PROMPT,
memory=memory,
verbose=True # Set to True to see the internal steps
)
Now, let's simulate a conversation. Notice how the memory automatically saves the input/output and retrieves relevant history for subsequent turns.
# First interaction
response = conversation_with_vectorstore_memory.predict(input="My favorite programming language is Python because it's versatile.")
print(response)
# Second interaction - unrelated
response = conversation_with_vectorstore_memory.predict(input="The weather today is sunny.")
print(response)
# Third interaction - refers back to the first statement implicitly
response = conversation_with_vectorstore_memory.predict(input="Why did I mention I liked Python?")
print(response)
If you run this with verbose=True
, you'll see output similar to this (simplified) for the third interaction:
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. ...
Relevant pieces of previous conversation:
Human: My favorite programming language is Python because it's versatile.
AI: That's great! Python is indeed known for its versatility, readability, and extensive libraries. It's used in web development, data science, AI, scripting, and much more.
(You do not need to use these pieces of information if not relevant)
Current conversation:
Human: Why did I mention I liked Python?
AI:
> Finished chain.
You mentioned you liked Python because of its versatility.
Notice how the Relevant pieces of previous conversation:
section was populated by the VectorStoreRetrieverMemory
retrieving the first interaction based on the semantic content of the third input ("Why did I mention I liked Python?"). The second, unrelated interaction about the weather was likely not retrieved (or ranked lower) because it was semantically dissimilar.
The process within the chain when using VectorStoreRetrieverMemory
can be visualized as follows:
Flow diagram illustrating the steps involved when using Vector Store Memory in a ConversationChain. User input triggers retrieval before prompt formatting, and the input/output pair is saved after the response is generated.
Retrieval Parameter (k
): The k
value in as_retriever(search_kwargs=dict(k=k))
is a primary tuning parameter. Increasing k
provides more context but increases prompt size and cost. Decreasing it saves tokens but might omit relevant information. You might also explore other search_type
options like "mmr"
(Maximal Marginal Relevance) to balance relevance and diversity in retrieved documents.
Persistence: The FAISS index in our example is in-memory and will be lost when the script ends. For production use, you'd typically want persistence. You can save and load a FAISS index locally:
# To save the index
index.save_local("my_faiss_index")
# To load the index later (requires the embedding model)
loaded_index = FAISS.load_local("my_faiss_index", embedding_model, allow_dangerous_deserialization=True)
retriever = loaded_index.as_retriever(search_kwargs=dict(k=2))
memory = VectorStoreRetrieverMemory(retriever=retriever, memory_key="history")
# ... re-create the chain using this memory
Security Note: Loading FAISS indexes saved with save_local
can be a security risk if the index file comes from an untrusted source, hence the allow_dangerous_deserialization=True
flag. For production systems interacting with potentially untrusted data, consider more secure serialization methods or managed vector database services. Alternatively, use cloud-based vector stores (Pinecone, Weaviate, etc.) discussed earlier, which handle persistence and scaling automatically.
Here is the full script combining the steps:
import os
from langchain_openai import OpenAIEmbeddings, OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
# Ensure your OPENAI_API_KEY is set
if not os.getenv("OPENAI_API_KEY"):
raise ValueError("OPENAI_API_KEY environment variable not set.")
# 1. Initialize Embeddings
embedding_model = OpenAIEmbeddings()
# 2. Initialize FAISS Vector Store
# Use a small trick to initialize with from_texts as it requires some text
try:
# Try loading if it exists
index = FAISS.load_local("my_faiss_index", embedding_model, allow_dangerous_deserialization=True)
print("Loaded existing FAISS index.")
except Exception:
print("Creating new FAISS index.")
# embedding_size = 1536 # Usually inferred from embeddings
index = FAISS.from_texts(["_initial_"], embedding_model, metadatas=[{"hnsw:space": "ip"}])
# 3. Create Retriever (retrieve top 2 relevant snippets)
retriever = index.as_retriever(search_kwargs=dict(k=2))
# 4. Instantiate Memory
memory = VectorStoreRetrieverMemory(retriever=retriever, memory_key="history")
# 5. Initialize LLM
llm = OpenAI(temperature=0)
# 6. Define Prompt Template
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
Relevant pieces of previous conversation:
{history}
(You do not need to use these pieces of information if not relevant)
Current conversation:
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input"], template=_DEFAULT_TEMPLATE
)
# 7. Create Conversation Chain
conversation_with_vectorstore_memory = ConversationChain(
llm=llm,
prompt=PROMPT,
memory=memory,
verbose=False # Set to True to see detailed logs
)
# --- Run Conversation ---
print("Starting conversation (type 'quit' to exit):")
while True:
user_input = input("Human: ")
if user_input.lower() == 'quit':
break
response = conversation_with_vectorstore_memory.predict(input=user_input)
print(f"AI: {response}")
# --- Save the index before exiting ---
try:
index.save_local("my_faiss_index")
print("Saved FAISS index.")
except Exception as e:
print(f"Error saving FAISS index: {e}")
print("Conversation ended.")
k
, weak embedding model, or noisy history) will lead to irrelevant context being fed to the LLM. Techniques like re-ranking or query transformation (discussed in Chapter 4) can sometimes help.This practical exercise demonstrated how to implement VectorStoreRetrieverMemory
, providing a powerful mechanism for maintaining long-term, semantically relevant context in conversational applications. By storing history in a vector store, you overcome the limitations of simple buffers and enable more coherent and knowledgeable interactions over extended periods. Remember to tune the retrieval parameters and consider persistence strategies based on your application's requirements.
© 2025 ApX Machine Learning