ApX logo
Open Source

Kerb

LLM Development Toolkit

An open-source Python toolkit for building production-ready LLM applications. Modular utilities for every stage of your LLM workflow.

AI utilities that power ApX Machine Learning, now open source.

View on GitHub
pip install kerb

For LLM Developers

A collection of utilities designed for modern LLM applications.

Simple

Advanced LLM techniques made simple. It provide clean, easy-to-use interfaces for complex operations.

Lightweight

Only install what you need. Kerb is modular, so you don't have to carry unnecessary dependencies.

Compatible

Suitable for any LLM project. Kerb is not a framework, it's a toolkit that plays well with others.

All Modules

Everything you need to build LLM applications

Agent

Agent orchestration and execution patterns for multi-step reasoning.

Cache

Response and embedding caching to reduce costs and latency.

Chunk

Text chunking utilities for optimal context windows and retrieval.

Config

Configuration management for models, providers, and application settings.

Context

Context window management and token budget tracking.

Document

Document loading and processing for PDFs, web pages, and more.

Embedding

Embedding generation and similarity search helpers.

Evaluation

Metrics and benchmarking tools for LLM outputs.

Fine-Tuning

Model fine-tuning utilities and large dataset preparation.

Generation

Unified LLM generation with multi-provider support (OpenAI, Anthropic, Gemini).

Memory

Conversation memory and entity tracking for stateful applications.

Multimodal

Image, audio, and video processing for multimodal models.

Parsing

Output parsing and validation (JSON, structured data, function calls).

Preprocessing

Text cleaning and preprocessing for LLM inputs.

Prompt

Prompt engineering utilities, templates, and chain-of-thought patterns.

Retrieval

RAG and vector search utilities for semantic retrieval.

Safety

Content moderation and safety filters.

Testing

Testing utilities for LLM outputs and evaluation.

Tokenizer

Token counting and text splitting for any model.

Quick Start

Installation

# Install just the basics (no dependencies)
pip install kerb

# Or install with the features you need
pip install kerb[generation]  # For LLM generation
pip install kerb[embeddings]  # For embeddings
pip install kerb[all]         # Everything

1. Basic LLM Generation

Generate text with any major LLM provider:

from kerb.generation import generate, ModelName, LLMProvider

# Simple generation
response = generate(
    "Write a haiku about Python programming",
    model=ModelName.GPT_4O_MINI,
    provider=LLMProvider.OPENAI
)

print(response.content)
print(f"Tokens: {response.usage.total_tokens}, Cost: $ {response.cost:.6f}")

2. Text Chunking for RAG

Split large documents for LLM processing:

from kerb.chunk import overlap_chunker

long_text = """
Large Language Models have revolutionized natural language processing.
They can understand context, generate human-like text, and perform
various tasks from translation to code generation. However, working
with LLMs requires careful consideration of token limits, context windows,
and efficient text processing strategies.
""" # Your long document

chunks = overlap_chunker(
  long_text,
  chunk_size=80,
  overlap_ratio=0.15
)

print(f"Split into {len(chunks)} chunks with overlap")

3. Embeddings & Semantic Search

Generate embeddings and find similar content:

from kerb.embedding import embed, cosine_similarity

# Generate embeddings
query_embedding = embed("machine learning algorithms", model=EmbeddingModel.ALL_MINILM_L6_V2)
doc_embedding = embed("neural networks and deep learning", model=EmbeddingModel.ALL_MINILM_L6_V2)

# Calculate similarity
similarity = cosine_similarity(query_embedding, doc_embedding)
print(f"Similarity: {similarity:.4f}")

4. Prompt Templates

Use templates for consistent prompts:

from kerb.prompt import render_template

template = """You are a {{role}} assistant.
Task: {{task}}
Context: {{context}}"""

prompt = render_template(template, {
    "role": "helpful Python",
    "task": "explain decorators",
    "context": "beginner level"
})

response = generate(prompt, model=ModelName.GPT_4O_MINI)

5. Document Loading

Load documents from various formats:

from kerb.document import load_document

# Auto-detects format (txt, md, json, csv, pdf, etc.)
doc = load_document("data/report.pdf")

print(f"Content: {doc.content[:200]}...")
print(f"Metadata: {doc.metadata}")

6. LLM Caching

Reduce some cost and latency:

from kerb.cache import create_memory_cache, generate_prompt_key
from kerb.generation import generate, ModelName

cache = create_memory_cache(max_size=1000, default_ttl=3600)

def cached_generate(prompt, model=ModelName.GPT_4O_MINI, temperature=0.7):
    cache_key = generate_prompt_key(
        prompt, 
        model=model.value, 
        temperature=temperature
    )
    
    if cached := cache.get(cache_key):
        return cached['response']
    
    response = generate(prompt, model=model, temperature=temperature)
    cache.set(cache_key, {'response': response, 'cost': response.cost})
    return response

# First call
response1 = cached_generate("Explain Python decorators briefly")

# Hit Cache
response2 = cached_generate("Explain Python decorators briefly")

7. Complete RAG Pipeline

Put it all together:

from kerb.document import load_document
from kerb.chunk import overlap_chunker
from kerb.embedding import embed, embed_batch, cosine_similarity
from kerb.generation import generate, ModelName
from kerb.prompt import render_template

# 1. Load and chunk documents
doc = load_document("knowledge_base.txt")
chunks = overlap_chunker(doc.content, chunk_size=500, overlap_ratio=0.15)

# 2. Create embeddings
chunk_embeddings = embed_batch(chunks)

# 3. Query and retrieve relevant chunks
query = "Why is my chatbot hallucinating and how do I fix it?"
query_embedding = embed(query)

# Find most similar chunks
similarities = [cosine_similarity(query_embedding, emb) 
                for emb in chunk_embeddings]
top_indices = sorted(range(len(similarities)), 
                     key=lambda i: similarities[i], 
                     reverse=True)[:3]
relevant_chunks = [chunks[i] for i in top_indices]

# 4. Generate response with context
prompt = render_template("""Answer this question using the context below.

Context:
{{context}}

Question: {{question}}

Answer:""", {
    "context": "\n\n".join(relevant_chunks),
    "question": query
})

response = generate(prompt, model=ModelName.GPT_4O_MINI)
print(response.content)

8. Agent with Tools (Advanced)

Create an AI agent that can use tools:

from kerb.agent.patterns import ReActAgent
from kerb.generation import generate, ModelName

def llm_function(prompt: str) -> str:
    """Connect agent to your LLM."""
    response = generate(prompt, model=ModelName.GPT_4O_MINI)
    return response.content

agent = ReActAgent(
    name="ResearchAgent",
    llm_func=llm_function,
    max_iterations=5
)

result = agent.run("Explain RAG like I'm a backend developer who just discovered AI exists")
print(result.output)