One of the most significant challenges in building reliable LLM applications is the non-deterministic nature of the models themselves. The same prompt can produce slightly different outputs on subsequent runs, making traditional unit testing difficult. Standard tests rely on predictable, repeatable outcomes to verify that code is working as expected. Furthermore, running tests that make live API calls is slow, expensive, and dependent on external services.To address this, you can use mock objects to simulate the behavior of an LLM. A mock replaces the real LLM client in your test environment with a predictable stand-in. This allows you to write fast, deterministic, and cost-free unit tests that verify your application's logic without ever hitting a live API.The testing module provides a MockLLM class designed for this purpose. It allows you to configure predefined responses and inspect how your application interacts with the LLM.Testing with Fixed ResponsesThe simplest way to use a mock is to have it return the same fixed response every time it is called. This is useful for testing a single, specific behavior in your application. To do this, you configure the MockLLM with a string response and set its behavior to MockBehavior.FIXED.First, let's define a simple function that takes an LLM client and uses it to summarize text. Designing your functions to accept an LLM client as a parameter, a practice known as dependency injection, is what makes them testable.def summarize_text(llm, text_to_summarize: str) -> str: """Summarizes text using the provided LLM client.""" prompt = f"Please summarize the following text concisely:\n\n{text_to_summarize}" response = llm.generate(prompt) return response.contentNow, we can test this function using a MockLLM instance. We will configure it to return a predictable summary, allowing us to verify our function's output without making a real API call.from kerb.testing import MockLLM, MockBehavior # Configure a mock LLM with a fixed response mock_llm = MockLLM( responses="This is the expected summary.", behavior=MockBehavior.FIXED ) # Call our function with the mock article_text = "A long article about artificial intelligence..." summary = summarize_text(mock_llm, article_text) print(f"Generated Summary: {summary}") # Assert that the function returned the mock's response assert summary == "This is the expected summary." # Verify that the mock was called mock_llm.assert_called() print("Assertion passed: The mock LLM was called as expected.")This test runs instantly, costs nothing, and will always pass as long as the summarize_text function correctly calls the generate method and returns its content.Simulating Conversational FlowsFor applications like chatbots, you need to test multi-turn conversations where the LLM's response changes with each turn. You can simulate this using MockBehavior.SEQUENTIAL, which provides a list of responses that the mock returns in order.Let's define a function that classifies sentiment and test it with a sequence of mock responses.def classify_sentiment(llm, text_to_classify: str) -> str: """Classifies text sentiment using the provided LLM.""" prompt = f"Classify the sentiment: {text_to_classify}" response = llm.generate(prompt) return response.content # Configure a mock with a sequence of responses mock_llm_sequential = MockLLM( responses=["Positive", "Negative", "Neutral"], behavior=MockBehavior.SEQUENTIAL ) # Test a sequence of inputs inputs = ["I love this!", "This is terrible.", "It is okay."] for text in inputs: sentiment = classify_sentiment(mock_llm_sequential, text) print(f"'{text}' -> {sentiment}") # The mock will have been called three times print(f"Total calls: {mock_llm_sequential.call_count}")Each time classify_sentiment calls the generate method, the mock provides the next response from the list, allowing you to test how your application handles a series of interactions.Validating Prompts with Pattern MatchingSometimes, the logic you need to test involves generating different prompts based on user input. For example, your application might route a user's request to different prompt templates. You can test this logic using MockBehavior.PATTERN. This mode uses a dictionary where keys are regular expression patterns and values are the corresponding responses.# Configure a mock with pattern-based responses mock_llm_pattern = MockLLM( responses={ r"summarize": "This is a summary.", r"translate.*spanish": "Hola.", r"classify": "Positive", }, behavior=MockBehavior.PATTERN, default_response="Request not understood." ) # Test different prompts summary_prompt = "summarize this text" translation_prompt = "translate to spanish: hello" unknown_prompt = "tell me a joke" print(f"'{summary_prompt}' -> '{mock_llm_pattern.generate(summary_prompt).content}'") print(f"'{translation_prompt}' -> '{mock_llm_pattern.generate(translation_prompt).content}'") print(f"'{unknown_prompt}' -> '{mock_llm_pattern.generate(unknown_prompt).content}'")This allows you to verify that your application is constructing the correct prompts without needing a real LLM to interpret them.Inspecting LLM InteractionsIn addition to checking your application's final output, you may need to verify exactly how it interacted with the LLM. The MockLLM instance tracks every call made to it, including the prompt and other parameters.You can access this information through the call_count and call_history attributes.mock_llm = MockLLM(responses="Test response", behavior=MockBehavior.FIXED) summarize_text(mock_llm, "Text to summarize.") classify_sentiment(mock_llm, "Text to classify.") print(f"Total calls made: {mock_llm.call_count}") # Inspect the last call last_call = mock_llm.get_last_call() print(f"Last prompt sent: {last_call['prompt']}")This is especially useful for validating complex prompt engineering logic. You can use assert_called_with() to check if a specific substring was present in any of the prompts sent to the mock.# Continuing from the previous example... try: # Check if a prompt contained the word "classify" mock_llm.assert_called_with("classify") print("Assertion passed: A prompt contained the word 'classify'.") except AssertionError as e: print(f"Assertion failed: {e}")By using mock objects, you can build a comprehensive and reliable test suite for your LLM applications. This practice isolates your application logic from the non-deterministic and external nature of LLMs, enabling you to develop with greater confidence and speed. Remember to reset the mock between tests using the mock_llm.reset() method to ensure each test is independent.