Buffer Memory for Short-Term Recall

The most direct way to give an application memory is to store the entire conversation history and include it with every subsequent request. This strategy is often referred to as Buffer Memory. It maintains a verbatim log of all user inputs and model outputs, ensuring the LLM has complete context for the next turn in the dialogue.

This approach is simple and effective for short conversations where recalling specific details is important. In modern LangChain applications, this is implemented by managing a chat history store and injecting the full list of messages into the prompt template for each new interaction.

How Buffer Memory Works

The mechanism is straightforward. Messages are collected and stored in a history object. When the chain is executed, this list of past messages is retrieved and inserted into the prompt sent to the LLM, typically using a placeholder variable. This provides the model with the full conversational thread.

The following diagram illustrates this cyclical process. Each time the user sends a message, the system updates the message history, which is then used to construct the prompt for the next LLM call.

The flow of a conversational chain with buffer memory. The history store is updated with the latest exchange and then used to populate the context for the next prompt.

Implementing Buffer Memory

Let's put this into practice. To implement this pattern, we use RunnableWithMessageHistory to wrap our chain. This component handles the logic of reading from and writing to the history store automatically.

First, ensure you have your environment variables set up, for example, your OPENAI_API_KEY.

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.output_parsers import StrOutputParser

# Set your API key
# os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

# 1. Initialize the LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

# 2. Create the Prompt Template
# We use a placeholder to inject the conversation history
prompt = ChatPromptTemplate.from_messages([
    ("system", "The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know."),
    ("placeholder", "{history}"),
    ("human", "{input}"),
])

# 3. Create the Chain
chain = prompt | llm | StrOutputParser()

# 4. Setup Memory Management
# We need a dictionary to store history for different sessions
store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# Wrap the chain with message history functionality
conversation = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Start the conversation
# We must provide a session_id configuration to track separate conversations
config = {"configurable": {"session_id": "alex_session"}}

response1 = conversation.invoke(
    {"input": "Hi, my name is Alex. What's the biggest planet in our solar system?"},
    config=config
)
print(response1)

# Continue the conversation
response2 = conversation.invoke(
    {"input": "Great, and what's its most famous feature?"},
    config=config
)
print(response2)

# The model remembers the context ("Jupiter")
response3 = conversation.invoke(
    {"input": "What was the name I gave you earlier?"},
    config=config
)
print(response3)

When you run this code, RunnableWithMessageHistory looks up the session history (creating it if it doesn't exist) and injects the stored messages into the {history} placeholder. For the second turn, the system constructs a prompt similar to this:

The following is a friendly conversation between a human and an AI...

[System Message]
[Human Message]: Hi, my name is Alex. What's the biggest planet in our solar system?
[AI Message]: Hello Alex! The biggest planet in our solar system is Jupiter...

[Human Message]: Great, and what's its most famous feature?

The history object has appended the first human-AI exchange, providing the necessary context for the model to understand that the new question refers to Jupiter.

Advantages and Limitations

The primary advantage of this buffer memory strategy is its perfect recall. The model receives the full, unaltered history, which minimizes the risk of misunderstanding context in short to medium-length dialogues.

The main limitation, however, is token consumption. As the conversation continues, the stored history grows linearly. This has two direct consequences:

Increased Cost: Every token in the history is sent with every new API call, which can significantly increase costs for long-running conversations.
Context Window Limits: Eventually, the conversation history plus the new input will exceed the LLM's maximum context window (e.g., 4,096 or 16,384 tokens). When this happens, the API call will fail.

Because of this limitation, this full-buffer approach is best suited for applications where conversations are expected to be relatively brief, such as a customer support chatbot handling a single issue or a simple task-oriented assistant. For applications requiring long-term memory, you will need more advanced strategies, which we will cover next.

Build LLM apps faster with Kerb

Cleaner syntax. Built-in debugging. Production-ready from day one.

Built for the AI systems behind ApX Machine Learning

Was this section helpful?

References

Memory, LangChain, 2025 (LangChain Inc.) - Official documentation for LangChain's memory module, detailing how ConversationBufferMemory and other memory types are implemented and used.
Prompt engineering, OpenAI, 2024 (OpenAI) - Explains fundamental principles of prompt engineering and managing context for OpenAI's language models, which directly relates to the context window limitations discussed.
Building LLM-Powered Applications: Learn to use Large Language Models for tasks like chatbots, summarization, and content creation, Josh Starmer, 2023 (Packt Publishing) - Provides practical insights into architecting and implementing LLM applications, including strategies for managing conversational state and context, which are fundamental to memory concepts like ConversationBufferMemory.