The Importance of State in Conversations

At their core, Large Language Models are powerful text completion engines, but they operate on a per-transaction basis. An LLM has no intrinsic memory of past interactions. If you ask it "What is the capital of France?" and then follow up with "What is its population?", the model has no idea that "its" refers to Paris. Each query is treated as a completely new, isolated event, disconnected from all previous ones. This stateless nature is a significant hurdle in building applications that require a continuous dialogue, such as chatbots, virtual assistants, or any system that needs to understand context from a series of exchanges.

The Challenge of Statelessness

Imagine a conversation where you have to re-introduce every topic and person in every sentence. It would be inefficient and unnatural. This is the default experience when interacting directly with an LLM API. The model processes the input you provide in a single call and returns an output. It doesn't retain any information from that call to inform the next one.

Let's illustrate with a simple exchange.

Turn 1:

User: "My name is Alex and I'm a software developer."
LLM: "Hello Alex! It's great to meet you. How can I help you today?"

Turn 2 (in a stateless system):

User: "What's my name?"
LLM: "I'm sorry, I do not know your name. I am a large language model and do not have access to personal information."

The model failed because the context from the first turn was lost. The second API call was entirely independent of the first, containing only the text "What's my name?". To the model, this question appeared out of nowhere.

The following diagram shows the difference between a stateless interaction and a stateful one. In a stateless setup, each user query is a separate, disconnected call. In a stateful application, a memory component preserves context between calls, enabling a coherent conversation.

In a stateless interaction, each query is isolated. In a stateful interaction, an application component manages memory, providing the LLM with the necessary context from previous turns.

The Manual Approach and Its Downsides

The most direct way to solve this is to manually manage the state. You could store the conversation history in a list and send the full history with every API call. This mirrors how modern Chat Models expect a list of messages (user, assistant, system) rather than a single string.

# A simplified manual approach to managing state

conversation_history = []

def get_llm_response(user_query):
    # Add the new user message to the history
    conversation_history.append({"role": "user", "content": user_query})
    
    # Call the LLM with the full context (pseudo-code)
    # response_obj = chat_model.invoke(conversation_history)
    response_text = "Your name is Alex." # Simulated response
    
    # Add the model's response to the history
    conversation_history.append({"role": "assistant", "content": response_text})
    
    return response_text

# First turn
get_llm_response("My name is Alex and I'm a software developer.")

# Second turn
# The history now contains the first exchange
print(get_llm_response("What's my name?"))
# Expected output: Your name is Alex.

While this works for short conversations, it introduces two significant problems as the dialogue grows:

Context Window Limitations: Every language model has a maximum context window, which is the total number of tokens (words and parts of words) it can process in a single request. If the conversation history becomes too long, it will exceed this limit, resulting in an API error.
Increased Cost and Latency: Most LLM providers charge based on the number of tokens in both the input and the output. By sending the entire history with every turn, you are repeatedly paying to process the same information. This also increases the processing time, leading to higher latency for the user.

This is exactly the issue LangChain's memory utilities address. They provide a standardized interface for storing, retrieving, and managing conversational history. Instead of manually constructing message lists and tracking tokens, you can integrate these components into your application to handle state management automatically. The following sections will show you how to implement different memory strategies to build effective conversational applications.

Build LLM apps faster with Kerb

Cleaner syntax. Built-in debugging. Production-ready from day one.

Built for the AI systems behind ApX Machine Learning

Was this section helpful?

References

Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 arXiv DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture, which underpins modern LLMs, and explains why they inherently process fixed-size inputs, leading to a stateless interaction model without explicit memory mechanisms.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020 NeurIPS 2020 DOI: 10.48550/arXiv.2005.11401 - This paper introduces Retrieval-Augmented Generation (RAG), a method that addresses context window limitations by retrieving relevant information from a knowledge base to augment the LLM's input, thus extending its effective 'memory' for factual recall beyond the immediate conversation.