For sequential operations managed by Chains, integrating memory is best handled using the LangChain Expression Language (LCEL). The standard approach involves wrapping your chain logic with a history management runnable.
Instead of passing a memory object into a legacy chain class, you use RunnableWithMessageHistory. This separates the core logic of your chain from the persistence of conversation history.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Create a prompt that accepts a history placeholder
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
])
# Create the LCEL chain
chain = prompt | llm
# Define a function to manage chat history state
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
# Wrap the chain with message history functionality
conversation = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
# The chain now manages history based on session_id
response = conversation.invoke(
{"input": "Hi there! My name is Alex."},
config={"configurable": {"session_id": "user_123"}}
)
print(response.content)
# Output: Hello Alex! It's nice to meet you. How can I help you today?
response = conversation.invoke(
{"input": "What is my name?"},
config={"configurable": {"session_id": "user_123"}}
)
print(response.content)
# Output: Your name is Alex.
When constructing chains that require both long-term storage (like a vector database) and short-term conversational memory, you can combine RunnablePassthrough for retrieval with RunnableWithMessageHistory for conversation context.
import faiss
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough
from langchain_community.chat_message_histories import ChatMessageHistory
# Setup Vector Store
embedding_model = OpenAIEmbeddings()
index = faiss.IndexFlatL2(1536)
# Initialize empty store
vectorstore = FAISS(embedding_model, index, InMemoryDocstore({}), {})
vectorstore.add_texts(["Bob lives in California.", "Bob enjoys hiking."])
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})
# Setup LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Create a prompt that includes context and history
template = """Answer the question based only on the following context:
{context}"""
prompt = ChatPromptTemplate.from_messages([
("system", template),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
])
# RAG Chain
def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])
rag_chain = (
RunnablePassthrough.assign(
context=lambda x: format_docs(retriever.invoke(x["input"]))
)
| prompt
| llm
)
# State management
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
conversation_with_retrieval = RunnableWithMessageHistory(
rag_chain,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
# Interaction
response = conversation_with_retrieval.invoke(
{"input": "Where does Bob live?"},
config={"configurable": {"session_id": "bob_session"}}
)
print(response.content)
# Output: Bob lives in California.
For production-grade applications, LangGraph is the standard environment for building agents. LangGraph treats the agent as a state machine. Memory in this context is handled by "Checkpointers," which persist the state of the graph (including chat history) between interactions.
When initializing a LangGraph agent, you provide a checkpointer. This component saves the state after every node execution, allowing the agent to resume or reference past interactions effectively.
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
# Define tools
@tool
def get_weather(city: str):
"""Get weather for a city."""
return "sunny"
tools = [get_weather]
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Initialize a Checkpointer for memory persistence
# In production, use PostgresSaver or similar
memory = MemorySaver()
# Create the agent
agent_executor = create_react_agent(llm, tools, checkpointer=memory)
# Run the agent with a configuration containing a thread_id
config = {"configurable": {"thread_id": "thread_1"}}
# First interaction
response = agent_executor.invoke(
{"messages": [("user", "My name is Clara.")]},
config=config
)
# Second interaction - accesses memory via the thread_id
response = agent_executor.invoke(
{"messages": [("user", "What's my name?")]},
config=config
)
# The response object contains the full state, including the answer
print(response["messages"][-1].content)
# Output: Clara
In this architecture, you do not manually pass a memory object to a prompt placeholder. Instead, the checkpointer automatically loads the graph state (which includes the list of messages) associated with the thread_id before execution and saves the updated state afterwards.
If you build a custom graph using StateGraph instead of the prebuilt agent, memory works similarly. You define the state schema (usually including a list of messages) and pass a checkpointer to the compiled graph.
# Conceptual example of running a custom graph with memory
# graph = StateGraph(StateSchema)
# ... define nodes and edges ...
# app = graph.compile(checkpointer=memory)
# app.invoke(inputs, config={"configurable": {"thread_id": "123"}})
This approach provides granular control over what is saved. For instance, you can define your state to store specific variables alongside the message history.
The following diagram illustrates how memory (Checkpointing) fits into the LangGraph execution process:
This diagram shows the cycle within a Graph Agent. The State is loaded from the Checkpointer at the start, updated and saved after each node execution, ensuring persistence even if the process is interrupted.
RunnableWithMessageHistory and LangGraph checkpointers, the session_id or thread_id is critical. This key determines which historical context is loaded. In a web application, this ID should be tied to the user's session or a specific conversation topic.PostgresSaver or SqliteSaver), concurrency is handled at the database level. Ensure your database configuration supports the expected load of simultaneous writes from different threads.Cleaner syntax. Built-in debugging. Production-ready from day one.
Built for the AI systems behind ApX Machine Learning
Was this section helpful?
VectorStoreRetrieverMemory retrieves relevant information from an external knowledge base.© 2026 ApX Machine LearningEngineered with