Adding Memory to Chains and Agents

Integrating Memory with Chains

The modern approach to making an application stateful is by wrapping the chain execution with history management capabilities. Using the LangChain Expression Language (LCEL), we can define a chain and then enhance it with RunnableWithMessageHistory. This component manages the conversation history, automatically reading previous messages before the prompt execution and updating the history with the new exchange.

To make this work, the setup requires:

A ChatPromptTemplate containing a MessagesPlaceholder for the conversation history.
A get_session_history function that retrieves or creates the message history object for a given session ID.

Let's look at a practical implementation. We will create a simple chain using ChatOpenAI and add memory to it.

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory

# Set up the language model
# Make sure your OPENAI_API_KEY is set in your environment
llm = ChatOpenAI(temperature=0.7)

# 1. Define the ChatPromptTemplate
# We use MessagesPlaceholder to insert the chat history dynamically.
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a friendly chatbot having a conversation with a human."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

# 2. Create the Chain using LCEL
chain = prompt | llm

# 3. Define the memory management
# We need a function to retrieve the history for a specific session.
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# 4. Wrap the chain with message history
conversation_chain = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

# First interaction
# We must provide a config with a session_id to persist history
response_1 = conversation_chain.invoke(
    {"input": "Hi there! My name is Alex."},
    config={"configurable": {"session_id": "session_1"}}
)
print(response_1.content)
# > Hi Alex! It's nice to meet you. My name is AI. How can I help you today?

# Second interaction
response_2 = conversation_chain.invoke(
    {"input": "What was the name I just told you?"},
    config={"configurable": {"session_id": "session_1"}}
)
print(response_2.content)
# > You told me your name is Alex.

The following diagram illustrates this data flow. The RunnableWithMessageHistory wrapper acts as a stateful manager that reads from and writes to the memory store during each execution.

The flow within a stateful chain execution. On each run, the wrapper provides past context to the prompt, and after the LLM generates a response, the memory is updated with the latest turn.

Integrating Memory with Agents

Agents, which use an LLM to make decisions about which tools to use, also require memory to handle multi-turn interactions. For example, a user might ask an agent to look something up online and then ask a follow-up question about the results. Without memory, the agent would have no context for the second question.

In modern LangChain, we define the agent and then execute it using an AgentExecutor. The memory is typically managed by passing a memory object to the executor. This allows the agent to maintain context across different steps and interactions.

Here is an example of a conversational agent that uses a search tool and remembers previous interactions.

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent, load_tools
from langchain.memory import ConversationBufferMemory
from langchain import hub

# Initialize the model
llm = ChatOpenAI(temperature=0)

# Load tools
# Using 'ddg-search' requires the duckduckgo-search package
tools = load_tools(["ddg-search"], llm=llm)

# Pull a standard prompt for tool calling agents
# This prompt includes a placeholder for 'chat_history'
prompt = hub.pull("hwchase17/openai-tools-agent")

# Initialize the agent
agent = create_tool_calling_agent(llm, tools, prompt)

# Initialize memory
# 'return_messages=True' is important for Chat Models
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Initialize the AgentExecutor
# We pass the memory to the executor
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=True
)

# First interaction
agent_executor.invoke({"input": "Who is the current CEO of NVIDIA?"})

# Second interaction, referencing the first one
agent_executor.invoke({"input": "What year was that company founded?"})

In the second call to agent_executor.invoke(), the agent knows that "that company" refers to NVIDIA. This is because the AgentExecutor automatically loads the context from the ConversationBufferMemory and includes it in the prompt sent to the LLM. The LLM then sees the previous question and answer, allowing it to correctly resolve the reference and use the search tool to find the founding year of NVIDIA.

The mechanism relies on the prompt having a specific placeholder (usually chat_history) which the AgentExecutor populates from the memory object before invoking the agent. This simple configuration transforms a stateless tool-using system into a capable conversational assistant.

Build LLM apps faster with Kerb

Cleaner syntax. Built-in debugging. Production-ready from day one.

Built for the AI systems behind ApX Machine Learning

Was this section helpful?

References

Memory, LangChain Contributors, 2023 (LangChain Inc.) - Official LangChain documentation explaining various memory types, their configuration, and how they manage conversational history within LLM applications.
Agents, LangChain Contributors, 2024 (LangChain) - Official LangChain documentation detailing agents, their design principles, supported types, and how they integrate components like tools and memory for complex, multi-turn interactions.
ReAct: Synergizing Reasoning and Acting in Language Models, Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao, 2022 arXiv preprint arXiv:2210.03629 DOI: 10.48550/arXiv.2210.03629 - Foundational paper introducing the ReAct paradigm, which combines reasoning and acting steps for language models. This method is crucial for understanding how sophisticated agents, like those in LangChain, use LLMs to make decisions and utilize tools with memory.