While storing a verbatim history of a conversation is direct, it presents a significant challenge: the context window. As a conversation lengthens, the list of messages can quickly exceed the token limit of even the most capable models. This situation leads to either truncation, where older messages are dropped, or API errors. For applications requiring long-running dialogues, a more scalable approach is necessary.
This is where summary memory comes into play. Instead of retaining every message, this strategy uses a language model to progressively condense the conversation history into a running summary. This approach keeps the token count manageable while preserving the essential information from the dialogue.
The main idea is to create a new, more compact summary after a few turns of conversation. This process takes the most recent messages and combines them with the previous summary to generate an updated one. This "rolling summary" evolves with the conversation, incorporating new information while keeping its overall length in check.
The progressive summarization process. New messages are combined with the previous summary to create an updated, compact history.
The memory module provides the create_progressive_summary function for this purpose. It takes a list of new messages and an optional existing_summary to build upon.
from kerb.core.types import Message
from kerb.memory.summaries import create_progressive_summary
# Initial messages in the conversation
messages_turn_1 = [
Message("user", "What's the best way to learn Python for data science?"),
Message("assistant", "Start with pandas for data manipulation and matplotlib for visualization.")
]
# Create the first summary
summary_1 = create_progressive_summary(messages_turn_1, summary_length="short")
print(f"Summary after turn 1:\n{summary_1}\n")
# New messages are added in the next turn
messages_turn_2 = [
Message("user", "What about machine learning libraries?"),
Message("assistant", "For machine learning, scikit-learn is a great starting point for classical algorithms.")
]
# Create a new summary using the old one as context
summary_2 = create_progressive_summary(
messages_turn_2,
existing_summary=summary_1,
summary_length="short"
)
print(f"Summary after turn 2:\n{summary_2}")
You can control the verbosity of the generated summary with the summary_length parameter, which accepts "short", "medium", or "long". This allows you to balance token savings against the level of detail retained.
While you can manage progressive summaries manually, the ConversationBuffer class can automate this process. When you initialize a buffer with enable_summaries=True, it automatically creates summaries of older messages as they get pushed out of its memory to stay within the max_messages limit. This provides a practical and efficient way to handle long conversations without manual intervention.
Let's observe this with a small buffer that is forced to prune messages.
from kerb.memory import ConversationBuffer
# Create a small buffer that will prune messages quickly
small_buffer = ConversationBuffer(
max_messages=5,
enable_summaries=True
)
print(f"Adding 8 messages to a buffer with max_messages=5...\n")
for i in range(8):
small_buffer.add_message("user", f"This is message number {i+1}.")
print(f"Messages currently stored: {len(small_buffer.messages)}")
print(f"Summaries created from pruned messages: {len(small_buffer.summaries)}")
if small_buffer.summaries:
first_summary = small_buffer.summaries[0]
print(f"\nSummary of pruned messages:")
print(f" '{first_summary.summary}'")
print(f" (This summary covers {first_summary.message_count} messages)")
As messages are added, the buffer prunes the oldest ones to maintain its size limit and condenses them into a summary, which is stored in small_buffer.summaries. This summary can then be included in the context for future LLM calls.
An alternative to a single progressive summary is hierarchical summarization. This technique involves breaking the conversation into chunks and summarizing each chunk independently. This can be useful for very long transcripts where you might want to retrieve specific points from different parts of the conversation, rather than relying on a single, continuously evolving summary.
The create_hierarchical_summary function facilitates this by creating a list of ConversationSummary objects, one for each segment of the conversation.
from kerb.core.types import Message
from kerb.memory.summaries import create_hierarchical_summary
long_conversation = [Message("user", f"Message {i+1}") for i in range(12)]
# Create summaries for chunks of 5 messages
hierarchical_summaries = create_hierarchical_summary(long_conversation, chunk_size=5)
print(f"Created {len(hierarchical_summaries)} hierarchical summaries:\n")
for i, summary_obj in enumerate(hierarchical_summaries):
print(f"Summary for chunk {i+1} (covering {summary_obj.message_count} messages):")
print(f" '{summary_obj.summary}'\n")
Summary memory is a powerful technique, but it is important to understand its trade-offs.
Benefits:
Drawbacks:
Choosing summary memory depends on your application's requirements. For chatbots that need to handle extended dialogues, it is an indispensable tool. For tasks requiring perfect recall of previous turns, a different strategy might be more appropriate.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with