A collection of utilities designed for modern LLM applications.
Simple
Advanced LLM techniques made simple. It provide clean, easy-to-use interfaces for complex operations.
Lightweight
Only install what you need. Kerb is modular, so you don't have to carry unnecessary dependencies.
Compatible
Suitable for any LLM project. Kerb is not a framework, it's a toolkit that plays well with others.
Everything you need to build LLM applications
Agent
Agent orchestration and execution patterns for multi-step reasoning.
Cache
Response and embedding caching to reduce costs and latency.
Chunk
Text chunking utilities for optimal context windows and retrieval.
Config
Configuration management for models, providers, and application settings.
Context
Context window management and token budget tracking.
Document
Document loading and processing for PDFs, web pages, and more.
Embedding
Embedding generation and similarity search helpers.
Evaluation
Metrics and benchmarking tools for LLM outputs.
Fine-Tuning
Model fine-tuning utilities and large dataset preparation.
Generation
Unified LLM generation with multi-provider support (OpenAI, Anthropic, Gemini).
Memory
Conversation memory and entity tracking for stateful applications.
Multimodal
Image, audio, and video processing for multimodal models.
Parsing
Output parsing and validation (JSON, structured data, function calls).
Preprocessing
Text cleaning and preprocessing for LLM inputs.
Prompt
Prompt engineering utilities, templates, and chain-of-thought patterns.
Retrieval
RAG and vector search utilities for semantic retrieval.
Safety
Content moderation and safety filters.
Testing
Testing utilities for LLM outputs and evaluation.
Tokenizer
Token counting and text splitting for any model.
# Install just the basics (no dependencies)
pip install kerb
# Or install with the features you need
pip install kerb[generation] # For LLM generation
pip install kerb[embeddings] # For embeddings
pip install kerb[all] # Everything
Generate text with any major LLM provider:
from kerb.generation import generate, ModelName, LLMProvider
# Simple generation
response = generate(
"Write a haiku about Python programming",
model=ModelName.GPT_4O_MINI,
provider=LLMProvider.OPENAI
)
print(response.content)
print(f"Tokens: {response.usage.total_tokens}, Cost: $ {response.cost:.6f}")
Split large documents for LLM processing:
from kerb.chunk import overlap_chunker
long_text = """
Large Language Models have revolutionized natural language processing.
They can understand context, generate human-like text, and perform
various tasks from translation to code generation. However, working
with LLMs requires careful consideration of token limits, context windows,
and efficient text processing strategies.
""" # Your long document
chunks = overlap_chunker(
long_text,
chunk_size=80,
overlap_ratio=0.15
)
print(f"Split into {len(chunks)} chunks with overlap")
Generate embeddings and find similar content:
from kerb.embedding import embed, cosine_similarity
# Generate embeddings
query_embedding = embed("machine learning algorithms", model=EmbeddingModel.ALL_MINILM_L6_V2)
doc_embedding = embed("neural networks and deep learning", model=EmbeddingModel.ALL_MINILM_L6_V2)
# Calculate similarity
similarity = cosine_similarity(query_embedding, doc_embedding)
print(f"Similarity: {similarity:.4f}")
Use templates for consistent prompts:
from kerb.prompt import render_template
template = """You are a {{role}} assistant.
Task: {{task}}
Context: {{context}}"""
prompt = render_template(template, {
"role": "helpful Python",
"task": "explain decorators",
"context": "beginner level"
})
response = generate(prompt, model=ModelName.GPT_4O_MINI)
Load documents from various formats:
from kerb.document import load_document
# Auto-detects format (txt, md, json, csv, pdf, etc.)
doc = load_document("data/report.pdf")
print(f"Content: {doc.content[:200]}...")
print(f"Metadata: {doc.metadata}")
Reduce some cost and latency:
from kerb.cache import create_memory_cache, generate_prompt_key
from kerb.generation import generate, ModelName
cache = create_memory_cache(max_size=1000, default_ttl=3600)
def cached_generate(prompt, model=ModelName.GPT_4O_MINI, temperature=0.7):
cache_key = generate_prompt_key(
prompt,
model=model.value,
temperature=temperature
)
if cached := cache.get(cache_key):
return cached['response']
response = generate(prompt, model=model, temperature=temperature)
cache.set(cache_key, {'response': response, 'cost': response.cost})
return response
# First call
response1 = cached_generate("Explain Python decorators briefly")
# Hit Cache
response2 = cached_generate("Explain Python decorators briefly")
Put it all together:
from kerb.document import load_document
from kerb.chunk import overlap_chunker
from kerb.embedding import embed, embed_batch, cosine_similarity
from kerb.generation import generate, ModelName
from kerb.prompt import render_template
# 1. Load and chunk documents
doc = load_document("knowledge_base.txt")
chunks = overlap_chunker(doc.content, chunk_size=500, overlap_ratio=0.15)
# 2. Create embeddings
chunk_embeddings = embed_batch(chunks)
# 3. Query and retrieve relevant chunks
query = "Why is my chatbot hallucinating and how do I fix it?"
query_embedding = embed(query)
# Find most similar chunks
similarities = [cosine_similarity(query_embedding, emb)
for emb in chunk_embeddings]
top_indices = sorted(range(len(similarities)),
key=lambda i: similarities[i],
reverse=True)[:3]
relevant_chunks = [chunks[i] for i in top_indices]
# 4. Generate response with context
prompt = render_template("""Answer this question using the context below.
Context:
{{context}}
Question: {{question}}
Answer:""", {
"context": "\n\n".join(relevant_chunks),
"question": query
})
response = generate(prompt, model=ModelName.GPT_4O_MINI)
print(response.content)
Create an AI agent that can use tools:
from kerb.agent.patterns import ReActAgent
from kerb.generation import generate, ModelName
def llm_function(prompt: str) -> str:
"""Connect agent to your LLM."""
response = generate(prompt, model=ModelName.GPT_4O_MINI)
return response.content
agent = ReActAgent(
name="ResearchAgent",
llm_func=llm_function,
max_iterations=5
)
result = agent.run("Explain RAG like I'm a backend developer who just discovered AI exists")
print(result.output)