While ConversationBufferMemory provides perfect recall by storing every message, its effectiveness diminishes as a conversation grows. The complete history can quickly exceed the LLM's context window limit, leading to errors and increased token costs for each API call. This approach is not sustainable for applications designed for extended interactions.To address this scaling problem, LangChain offers ConversationSummaryMemory. Instead of retaining a verbatim transcript, this component uses a language model to create a running summary of the interaction. As the conversation progresses, the summary is continuously updated to incorporate new exchanges, keeping the context concise and manageable.How Summarization Memory WorksThe core mechanism involves an additional LLM call dedicated to condensing the conversation history. After each user-AI exchange, the memory component takes the existing summary, appends the latest messages, and asks an LLM to generate a new, updated summary. This new summary then replaces the old one in the memory buffer. When the chain is next invoked, it is this condensed summary, not the full chat log, that gets passed to the main LLM along with the new user input.This process effectively trades off perfect recall for scalability. It allows an application to maintain a sense of context over a very long conversation without ever hitting token limits.digraph G { rankdir=TB; node [shape=box, style="rounded,filled", fontname="Arial", fontsize=10]; edge [fontname="Arial", fontsize=9]; subgraph cluster_0 { label="Interaction Flow"; bgcolor="#e9ecef"; user_input [label="1. User Input\n(e.g., 'What was the first topic?')", fillcolor="#a5d8ff"]; chain [label="2. ConversationChain", fillcolor="#ced4da"]; memory [label="3. Get Summary from Memory", fillcolor="#ffec99"]; prompt [label="4. Construct Prompt", fillcolor="#bac8ff"]; llm [label="5. Main LLM Call", fillcolor="#b2f2bb"]; response [label="6. AI Response", fillcolor="#ced4da"]; user_output [label="7. Send to User", fillcolor="#a5d8ff"]; user_input -> chain; chain -> memory; memory -> prompt [label="Current Summary"]; user_input -> prompt [label="New Input"]; prompt -> llm; llm -> response; response -> user_output; } subgraph cluster_1 { label="Memory Update Cycle"; bgcolor="#e9ecef"; update_trigger [label="8. After Exchange", fillcolor="#ced4da"]; prepare_summary [label="9. Prepare Data for Summarizer", fillcolor="#ffec99"]; summarizer_llm [label="10. Summarizer LLM Call", fillcolor="#fcc2d7"]; new_summary [label="11. Store New Summary", fillcolor="#ffd8a8"]; response -> update_trigger [style=dashed]; update_trigger -> prepare_summary [label="Old Summary + New Messages"]; prepare_summary -> summarizer_llm; summarizer_llm -> new_summary; } user_output -> update_trigger [style=invis]; }The flow for an application using summarization memory. A separate LLM call is made after the main interaction to update the conversation summary.Implementing ConversationSummaryMemoryPutting this into practice is similar to using other memory types, with one important difference: you must provide an LLM to the memory object itself, as it needs one to perform the summarization.Let's set up a ConversationChain that uses ConversationSummaryMemory. We will print the contents of the memory buffer after each interaction to observe how the summary evolves.import os from langchain_openai import OpenAI from langchain.chains import ConversationChain from langchain.memory import ConversationSummaryMemory # Set up your OpenAI API key # os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY" # Initialize the LLM for both the chain and the memory's summarizer llm = OpenAI(temperature=0) # Initialize ConversationSummaryMemory # The LLM is required to generate the summary summary_memory = ConversationSummaryMemory(llm=llm) # Create the ConversationChain conversation_with_summary = ConversationChain( llm=llm, memory=summary_memory, verbose=True # Set to True to see the prompt being sent to the LLM ) # First interaction conversation_with_summary.predict(input="Hi, my name is Alex and I'm interested in machine learning.") print(f"Memory Buffer:\n{summary_memory.buffer}\n") # Second interaction conversation_with_summary.predict(input="I'm particularly interested in reinforcement learning. Can you suggest a good starting point?") print(f"Memory Buffer:\n{summary_memory.buffer}\n") # Third interaction conversation_with_summary.predict(input="That sounds great. What's my name?") print(f"Memory Buffer:\n{summary_memory.buffer}\n")When you run this code, pay close attention to the verbose output from the chain and the printed memory buffer.First Interaction Output: The initial prompt is simple. After the exchange, the memory buffer contains a new summary.Memory Buffer: The human introduces themselves as Alex and expresses an interest in machine learning. The AI responds by offering to provide information on the topic.Second Interaction Output: Notice that the prompt sent to the LLM now includes the summary of the first interaction, not the raw text. After the second exchange, the summary is updated again.Memory Buffer: The human, Alex, is interested in machine learning, specifically reinforcement learning. The AI suggests starting with the book "Reinforcement Learning: An Introduction" by Sutton and Barto.Third Interaction Output: The LLM correctly recalls the user's name because it was preserved in the running summary.> Entering new ConversationChain chain... Prompt after formatting: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. Current conversation: The human, Alex, is interested in machine learning, specifically reinforcement learning. The AI suggests starting with the book "Reinforcement Learning: An Introduction" by Sutton and Barto. Human: That sounds great. What's my name? AI: > Finished chain. Your name is Alex. Memory Buffer: The human, Alex, is interested in machine learning, specifically reinforcement learning. The AI suggests starting with the book "Reinforcement Learning: An Introduction" by Sutton and Barto. Alex then asks the AI to recall his name, which the AI correctly identifies as Alex.Trade-offsWhile effective, ConversationSummaryMemory introduces two primary trade-offs:Increased Latency: Because each conversational turn requires an extra LLM call to update the summary, the overall response time for the user can be slightly longer compared to using a simple buffer.Potential Information Loss: The summarization process is inherently lossy. An LLM might fail to capture a subtle but important detail from an earlier part of the conversation. The quality of the summary is dependent on the capability of the LLM used for the summarization task. For most use cases, this is acceptable, but it's a factor to consider for applications requiring high-fidelity recall of specific details.This memory type is well-suited for applications like customer support bots, long-form research assistants, or any scenario where maintaining the general context of a lengthy dialogue is more important than remembering every single word. It provides a practical solution for building stateful applications that can gracefully handle extended interactions.