Executing Your First Generation Call

The Kerb Toolkit allows you to make your first call to a model with ease, simplifying the process of text generation. It provides a unified function, generate(), that acts as the primary interface for all text generation tasks, regardless of the underlying provider. This design allows you to write your application logic once and easily switch between models from OpenAI, Anthropic, Google, or others by simply changing a few parameters.

The Generate Function

The generate() function is the main entry point for interacting with LLMs. It abstracts away the provider-specific details of API calls, such as request formatting and authentication, allowing you to focus on your application's logic.

At its simplest, you can pass a string prompt to the function. Let's make our first call to generate a short poem.

from kerb.generation import generate, ModelName, LLMProvider

# Make a generation call to OpenAI's GPT-4o-mini model
response = generate(
    "Write a short, three-line poem about Python code.",
    model=ModelName.GPT_4O_MINI,
    provider=LLMProvider.OPENAI
)

# The generated text is in the 'content' attribute
print(response.content)

In this example, we provide three main arguments:

The prompt as a simple string.
The model identifier, using the ModelName enum for clarity and to prevent typos.
The provider, specified with the LLMProvider enum.

This single function call handles the entire process: it finds the correct API key from your configuration, formats the request for the specified provider, sends it, and parses the response into a standardized object.

Understanding the Response Object

The generate() function returns a structured GenerationResponse object, not just the raw text. This object contains valuable metadata about the API call, which is useful for debugging, performance monitoring, and cost tracking.

Let's inspect the attributes of the response object from our previous call.

# Assuming 'response' is the GenerationResponse from the previous example

print(f"Model Used: {response.model}")
print(f"Provider: {response.provider.value}")
print(f"Output Text:\n{response.content}")
print(f"Latency: {response.latency:.3f} seconds")

# Access token usage and cost information
if response.usage:
    print(f"Total Tokens: {response.usage.total_tokens}")
    print(f"Input Tokens: {response.usage.input_tokens}")
    print(f"Output Tokens: {response.usage.output_tokens}")

if response.cost is not None:
    print(f"Estimated Cost: ${response.cost:.6f}")

This structured output provides several benefits:

content: The text generated by the model.
model and provider: Confirms which model and provider handled the request.
latency: Measures the duration of the API call, helping you identify performance issues.
usage: Provides a breakdown of token consumption, which is essential for managing context windows and predicting costs.
cost: Gives an estimated cost for the call based on the provider's pricing, making it easier to monitor application expenses.

Using Message Lists for Structured Prompts

While a simple string is sufficient for many tasks, most modern chat-based models are optimized for a structured list of messages. This format allows you to define roles for different parts of the conversation, such as system instructions, user queries, and previous assistant responses. The toolkit supports this through the Message object.

Here is how you can structure a prompt with a system message to guide the model's behavior and a user message containing the query.

from kerb.core import Message
from kerb.core.types import MessageRole

# Create a list of messages
messages = [
    Message(role=MessageRole.SYSTEM, content="You are a helpful assistant that explains programming concepts in one sentence."),
    Message(role=MessageRole.USER, content="Explain what a list comprehension is in Python.")
]

response = generate(
    messages,
    model=ModelName.GPT_4O_MINI,
    provider=LLMProvider.OPENAI
)

print(response.content)

Using a message list is the recommended approach for building conversational applications, as it provides a clear and organized way to manage dialogue history. For convenience, you can also use a list of dictionaries with "role" and "content" keys instead of Message objects.

Controlling Generation Parameters

You can influence the model's output by passing additional parameters to the generate() function. Two of the most common parameters are temperature and max_tokens.

temperature: A value between 0 and 2 that controls randomness. Lower values (e.g., 0.2) make the output more deterministic and focused, while higher values (e.g., 0.8) make it more creative and diverse.
max_tokens: The maximum number of tokens to generate in the response. This helps control the length of the output and manage costs.

response = generate(
    "Write a creative, one-paragraph story about a robot who learns to paint.",
    model=ModelName.GPT_4O_MINI,
    provider=LLMProvider.OPENAI,
    temperature=0.8,  # Higher temperature for more creativity
    max_tokens=150      # Limit the story's length
)

print(response.content)

For managing multiple parameters, you can use a GenerationConfig object. This allows you to define a reusable set of parameters for different parts of your application, ensuring consistency.

from kerb.generation import GenerationConfig

# Create a reusable configuration
config = GenerationConfig(
    temperature=0.2,
    max_tokens=100,
    top_p=0.9
)

response = generate(
    "What is an API?",
    model=ModelName.GPT_4O_MINI,
    provider=LLMProvider.OPENAI,
    config=config
)

print(response.content)

Switching Between Providers

One of the significant advantages of the unified interface is the ability to switch between different LLM providers with minimal code changes. This is valuable for A/B testing models, choosing the most cost-effective option for a given task, or adding redundancy to your application.

For example, switching from an OpenAI model to an Anthropic model is as simple as changing the model and provider arguments.

# Call using an OpenAI model
response_openai = generate(
    "Name three benefits of using Python.",
    model=ModelName.GPT_4O_MINI,
    provider=LLMProvider.OPENAI
)
print(f"OpenAI Response:\n{response_openai.content}\n")

# Call using an Anthropic model
try:
    response_anthropic = generate(
        "Name three benefits of using Python.",
        model=ModelName.CLAUDE_35_HAIKU,
        provider=LLMProvider.ANTHROPIC
    )
    print(f"Anthropic Response:\n{response_anthropic.content}")
except Exception as e:
    print(f"Could not call Anthropic API: {e}")

With this foundation, you can now reliably generate text from any supported LLM. In the next section, we will explore how to handle streaming responses, which is important for creating real-time, interactive applications like chatbots.

Was this section helpful?

References

Chat Completions API, OpenAI, 2024 (OpenAI) - Official documentation explaining how to interact with OpenAI's chat models, covering structured prompts, common generation parameters, and understanding API responses, which are relevant to general LLM API usage.
Use generation parameters for text models, Google Cloud, 2024 (Google) - Documentation that clarifies the purpose and impact of various generation parameters like temperature, top_p, and max_output_tokens, providing insight into controlling LLM output characteristics.