Using Summary Memory for Long Conversations

While storing a verbatim history of a conversation is direct, it presents a significant challenge: the context window. As a conversation lengthens, the list of messages can quickly exceed the token limit of even the most capable models. This situation leads to either truncation, where older messages are dropped, or API errors. For applications requiring long-running dialogues, a more scalable approach is necessary.

This is where summary memory comes into play. Instead of retaining every message, this strategy uses a language model to progressively condense the conversation history into a running summary. This approach keeps the token count manageable while preserving the essential information from the dialogue.

The Progressive Summarization Method

The main idea is to create a new, more compact summary after a few turns of conversation. This process takes the most recent messages and combines them with the previous summary to generate an updated one. This "rolling summary" evolves with the conversation, incorporating new information while keeping its overall length in check.

The progressive summarization process. New messages are combined with the previous summary to create an updated, compact history.

The memory module provides the create_progressive_summary function for this purpose. It takes a list of new messages and an optional existing_summary to build upon.

from kerb.core.types import Message
from kerb.memory.summaries import create_progressive_summary

# Initial messages in the conversation
messages_turn_1 = [
    Message("user", "What's the best way to learn Python for data science?"),
    Message("assistant", "Start with pandas for data manipulation and matplotlib for visualization.")
]

# Create the first summary
summary_1 = create_progressive_summary(messages_turn_1, summary_length="short")
print(f"Summary after turn 1:\n{summary_1}\n")

# New messages are added in the next turn
messages_turn_2 = [
    Message("user", "What about machine learning libraries?"),
    Message("assistant", "For machine learning, scikit-learn is a great starting point for classical algorithms.")
]

# Create a new summary using the old one as context
summary_2 = create_progressive_summary(
    messages_turn_2,
    existing_summary=summary_1,
    summary_length="short"
)
print(f"Summary after turn 2:\n{summary_2}")

You can control the verbosity of the generated summary with the summary_length parameter, which accepts "short", "medium", or "long". This allows you to balance token savings against the level of detail retained.

Automatic Summarization in ConversationBuffer

While you can manage progressive summaries manually, the ConversationBuffer class can automate this process. When you initialize a buffer with enable_summaries=True, it automatically creates summaries of older messages as they get pushed out of its memory to stay within the max_messages limit. This provides a practical and efficient way to handle long conversations without manual intervention.

Let's observe this with a small buffer that is forced to prune messages.

from kerb.memory import ConversationBuffer

# Create a small buffer that will prune messages quickly
small_buffer = ConversationBuffer(
    max_messages=5,
    enable_summaries=True
)

print(f"Adding 8 messages to a buffer with max_messages=5...\n")
for i in range(8):
    small_buffer.add_message("user", f"This is message number {i+1}.")

print(f"Messages currently stored: {len(small_buffer.messages)}")
print(f"Summaries created from pruned messages: {len(small_buffer.summaries)}")

if small_buffer.summaries:
    first_summary = small_buffer.summaries[0]
    print(f"\nSummary of pruned messages:")
    print(f"  '{first_summary.summary}'")
    print(f"  (This summary covers {first_summary.message_count} messages)")

As messages are added, the buffer prunes the oldest ones to maintain its size limit and condenses them into a summary, which is stored in small_buffer.summaries. This summary can then be included in the context for future LLM calls.

Hierarchical Summarization

An alternative to a single progressive summary is hierarchical summarization. This technique involves breaking the conversation into chunks and summarizing each chunk independently. This can be useful for very long transcripts where you might want to retrieve specific points from different parts of the conversation, rather than relying on a single, continuously evolving summary.

The create_hierarchical_summary function facilitates this by creating a list of ConversationSummary objects, one for each segment of the conversation.

from kerb.core.types import Message
from kerb.memory.summaries import create_hierarchical_summary

long_conversation = [Message("user", f"Message {i+1}") for i in range(12)]

# Create summaries for chunks of 5 messages
hierarchical_summaries = create_hierarchical_summary(long_conversation, chunk_size=5)

print(f"Created {len(hierarchical_summaries)} hierarchical summaries:\n")
for i, summary_obj in enumerate(hierarchical_summaries):
    print(f"Summary for chunk {i+1} (covering {summary_obj.message_count} messages):")
    print(f"  '{summary_obj.summary}'\n")

Trade-offs

Summary memory is a powerful technique, but it is important to understand its trade-offs.

Benefits:

Token Efficiency: It effectively manages the size of the conversation history, allowing for very long interactions without exceeding context limits.
Scalability: It can theoretically handle conversations of any length by continuously condensing the history.

Drawbacks:

Information Loss: The summarization process is inherently lossy. Details or specific phrasings might be omitted from the summary.
Increased Latency and Cost: Each summarization step requires an additional LLM call, which adds latency and cost to your application.
Potential for Drift: Over a very long conversation, the summary might drift from the original intent or lose important foundational context from early in the dialogue.

Choosing summary memory depends on your application's requirements. For chatbots that need to handle extended dialogues, it is an indispensable tool. For tasks requiring perfect recall of previous turns, a different strategy might be more appropriate.

Was this section helpful?

References

Dialogue Summarization with Hierarchical Structure and Global Information, Tianhao Shen, Mingtong Liu, Ming Zhou, Deyi Xiong, 2022 Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics) DOI: 10.18653/v1/2022.emnlp-main.730 - Explores methods for summarizing dialogues by leveraging hierarchical structures, directly addressing the 'Hierarchical Summarization' concept presented.
Conversational AI: Dialogue Systems, Large Language Models, and the Future of AI, Michael J. McTear, Zong-Yuan Huang, Liming Luke Liu, 2023 (Springer) DOI: 10.1007/978-3-031-26792-2 - A recent book providing a broad overview of conversational AI, including detailed discussions on memory management, context handling, and the challenges and trade-offs in designing effective dialogue systems.