Let's put theory into practice by setting up some basic tests for a LangChain chain. This exercise focuses on verifying the structural integrity and expected flow of the chain, rather than the nuanced quality of the LLM's output, which requires the evaluation techniques discussed earlier. We will use pytest
, a popular Python testing framework, and mocking to isolate our chain from actual LLM API calls during testing.
Imagine we have a simple LangChain chain designed to take a piece of text and summarize it into three bullet points. It might involve:
PromptTemplate
to instruct the LLM.OutputParser
(perhaps SimpleJsonOutputParser
or a custom one) to structure the bullet points.Here's a hypothetical structure for our chain using LangChain Expression Language (LCEL):
# Assume necessary imports: ChatOpenAI, PromptTemplate, StrOutputParser, JsonOutputParser etc.
# Assume OPENAI_API_KEY is set in the environment
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
# Simplified example - A real chain might use a JsonOutputParser for better structure
prompt_template = ChatPromptTemplate.from_template(
"Summarize the following text into exactly three concise bullet points:\n\n{text}\n\nFormat the output as a numbered list."
)
model = ChatOpenAI(model="gpt-3.5-turbo") # Or your preferred model
output_parser = StrOutputParser() # Simple parser for this example
# Define the chain
summary_chain = (
{"text": itemgetter("text")}
| prompt_template
| model
| output_parser
)
# Example usage (not part of the test itself)
# input_text = "Large Language Models are transforming industries by enabling natural language interaction..."
# result = summary_chain.invoke({"text": input_text})
# print(result)
First, ensure you have pytest
and pytest-mock
installed:
pip install pytest pytest-mock langchain langchain_openai python-dotenv
We'll structure our tests in a separate file, say test_summary_chain.py
. We also need a way to manage API keys securely; using environment variables (and perhaps a .env
file loaded by python-dotenv
for local testing) is common practice.
Directly calling the LLM API in tests makes them slow, expensive, and potentially non-deterministic. The core idea for unit and integration testing the chain's structure is to mock the LLM call. We want to verify that our prompt is constructed correctly and that the chain processes the LLM's expected response format appropriately.
Python's built-in unittest.mock
library (or the mocker
fixture provided by pytest-mock
) is excellent for this.
Let's create test_summary_chain.py
:
import pytest
from unittest.mock import MagicMock # For creating mock objects
from operator import itemgetter
# Assuming your chain definition is in a file named `summary_chain_module.py`
from summary_chain_module import summary_chain, prompt_template, model, output_parser
# Test the prompt template formatting
def test_prompt_template_formatting():
"""Verify the prompt template inserts the text correctly."""
sample_text = "This is sample input text."
expected_prompt_value = "Summarize the following text into exactly three concise bullet points:\n\nThis is sample input text.\n\nFormat the output as a numbered list."
# Create a RunnablePassthrough equivalent for testing the template part
prompt_part = {"text": itemgetter("text")} | prompt_template
result = prompt_part.invoke({"text": sample_text})
# ChatPromptTemplate results have a 'to_string()' method
assert result.to_string() == expected_prompt_value
# Test the full chain with a mocked LLM
def test_summary_chain_with_mock_llm(mocker): # Use pytest-mock's 'mocker' fixture
"""Verify the chain processes a mocked LLM response correctly."""
sample_text = "This is the text to be summarized."
mock_llm_output = "1. First point.\n2. Second point.\n3. Third point."
# Mock the 'invoke' method of the 'model' instance within the chain
# We find where 'model' is used (in summary_chain_module) and patch it there.
mock_model_invoke = mocker.patch('summary_chain_module.model.invoke')
mock_model_invoke.return_value = MagicMock(content=mock_llm_output) # Mock the AIMessage structure if needed, here StrOutputParser expects string directly or AIMessage with .content
# Invoke the actual chain
result = summary_chain.invoke({"text": sample_text})
# Assert that the prompt was passed correctly to the mocked model
# The first argument of the first call to the mocked function
call_args = mock_model_invoke.call_args[0][0]
expected_prompt = prompt_template.invoke({"text": sample_text})
assert call_args == expected_prompt
# Assert that the final output is the mocked LLM output processed by the parser
assert result == mock_llm_output
# Test the output parser logic (if it were more complex)
# For StrOutputParser, there isn't much to test, but if using JsonOutputParser:
# def test_output_parser():
# mock_llm_response_content = '{"summary": ["Point 1", "Point 2", "Point 3"]}'
# # Assume json_output_parser = JsonOutputParser(...)
# # parsed_output = json_output_parser.parse(mock_llm_response_content)
# # assert parsed_output == {"summary": ["Point 1", "Point 2", "Point 3"]}
Navigate to the directory containing test_summary_chain.py
in your terminal and run pytest
:
pytest
You should see output indicating whether your tests passed or failed.
These tests successfully verify:
test_prompt_template_formatting
ensures our template correctly incorporates input variables.test_summary_chain_with_mock_llm
confirms that the input flows through the prompt template to the (mocked) model, and the (mocked) model's output is correctly processed by the output parser. It checks the connections and data flow within the chain.Remember, these tests do not evaluate if the mock_llm_output
is a good summary. That requires the evaluation strategies discussed previously, such as using evaluation datasets, comparing against reference summaries (e.g., ROUGE scores), or employing LLM-based evaluation.
This practice forms a fundamental layer of testing, ensuring your chain is structurally sound before moving on to more complex quality assessments. It catches errors in logic, parsing, and component integration early in the development cycle.
© 2025 ApX Machine Learning