Building applications with Large Language Models involves more than just sending a text string to an API. You need to manage different provider APIs, construct effective prompts, handle various data formats, and orchestrate complex workflows. The Kerb toolkit is designed to bring structure and simplicity to this process. Its architecture is guided by a few main principles that make building, testing, and maintaining LLM applications more efficient.
A significant challenge in the LLM ecosystem is the variety of APIs. OpenAI, Anthropic, and Google each have their own SDKs, data formats, and authentication methods. Switching between models from different providers can require substantial code changes, making experimentation and production flexibility difficult.
The toolkit addresses this with a unified generation interface. The generate() function serves as a single, consistent entry point for interacting with any supported LLM. You specify the provider and model, and the toolkit handles the provider-specific implementation details behind the scenes. This design makes your application portable and allows you to switch models with minimal code changes.
For example, generating text with OpenAI's GPT-4o-mini looks nearly identical to generating with Anthropic's Claude-3.5-Haiku.
from kerb.generation import generate, ModelName, LLMProvider
# Generate text using OpenAI
response_openai = generate(
"What is the difference between a list and a tuple in Python?",
model=ModelName.GPT_4O_MINI,
provider=LLMProvider.OPENAI
)
# Generate text using Anthropic with the same interface
response_anthropic = generate(
"What is the difference between a list and a tuple in Python?",
model=ModelName.CLAUDE_35_HAIKU,
provider=LLMProvider.ANTHROPIC
)
print(f"OpenAI Response: {response_openai.content[:70]}...")
print(f"Anthropic Response: {response_anthropic.content[:70]}...")
This abstraction lets you focus on your application's logic rather than the specifics of each provider's API. Throughout the documentation, you will use this unified interface for all text generation tasks.
Modern LLM applications are often complex systems composed of multiple specialized components. A Retrieval-Augmented Generation (RAG) system, for example, requires components for loading documents, splitting them into chunks, creating embeddings, retrieving relevant information, and finally, generating a response.
The toolkit is organized into distinct, focused modules that correspond to these logical components. This modular design allows you to use only the parts you need and compose them to build sophisticated applications.
The toolkit's modules can be composed to build complex workflows like RAG systems and autonomous agents.
Each module has a specific responsibility:
document & chunk: Load, preprocess, and split text data.embedding: Create numerical representations of text for semantic search.retrieval: Find relevant information from a knowledge base.prompt: Construct and manage dynamic prompts with templates.generation: Interact with LLMs through the unified interface.memory: Maintain state in conversational applications.agent: Build autonomous agents that can reason and use tools.This separation of concerns makes your code cleaner, easier to test, and more maintainable. You can swap out a chunking strategy or an embedding model without altering the rest of your application.
Hardcoding model names, API keys, and generation parameters directly into your application logic leads to rigid and insecure code. The toolkit promotes a configuration-driven approach, where these settings are managed separately from the application code.
The config module provides a ConfigManager that centralizes all configuration. This allows you to define models, set provider details, and manage API keys in a structured way, often by loading them securely from environment variables.
from kerb.config import ConfigManager, ModelConfig
from kerb.config.enums import ProviderType
# Initialize a configuration manager
config = ConfigManager()
# Define a model configuration
gpt4_config = ModelConfig(
name="gpt-4o-mini",
provider=ProviderType.OPENAI,
api_key_env_var="OPENAI_API_KEY", # Securely load key from environment
max_tokens=4096,
temperature=0.7
)
# Add the model to the central configuration
config.add_model(gpt4_config)
# Later in your application, you can retrieve this configuration
retrieved_config = config.get_model("gpt-4o-mini")
print(f"Loaded config for {retrieved_config.name} with temp={retrieved_config.temperature}")
By separating configuration from logic, you can easily switch between development and production environments, update model parameters without changing code, and keep sensitive credentials out of your source control. We will put this principle into practice in the next section as we set up our first LLM provider.
Was this section helpful?
prompt module.© 2026 ApX Machine LearningEngineered with