As your Large Language Model applications evolve from simple scripts into more complex systems, the way you organize your code becomes increasingly important. Just as in traditional software development, a well-structured codebase is easier to understand, maintain, test, and extend. This is especially true for LLM applications, which often involve unique components like prompt templates, interaction logic with external APIs, and specific data handling for model inputs and outputs. Adopting good structural practices early on will save significant effort down the line, particularly when collaborating with others or preparing for deployment.

The Principle of Separation of Concerns

A fundamental principle in software design is the separation of concerns. This means that different parts of your application should be responsible for distinct functionalities. Applying this to LLM applications helps manage complexity. Consider isolating these common concerns:

LLM Interaction: Code dedicated to making requests to the LLM API (e.g., OpenAI, Anthropic), handling responses, managing parameters like temperature or max tokens, and dealing with API-specific errors or rate limits.
Prompt Management: Logic for loading, formatting, and potentially versioning your prompts. Prompts are central to your application's behavior and should be treated as important assets, not just hardcoded strings scattered throughout the code.
Business Logic: The core functionality of your application that utilizes the LLM. This might involve orchestrating multiple LLM calls, processing intermediate results, or making decisions based on model outputs.
Data Handling: Code responsible for preparing input data for the LLM, parsing the potentially unstructured output from the LLM, validating the results against expected formats (as discussed in Chapter 7), and interacting with data sources (e.g., for RAG systems discussed in Chapter 6).
Configuration: Managing settings like API keys, model identifiers, file paths, and behavioral parameters. This should be externalized from the main codebase.
Presentation/API Layer: If your application has a user interface or is exposed as an API (e.g., using Flask or FastAPI), this layer handles incoming requests and outgoing responses, interacting with the business logic layer.

Organizing Your Project: Directory Structures

A logical directory structure makes navigating and understanding your project much easier. The ideal structure depends on the complexity of your application, but here are a couple of common patterns:

Simple Application Structure:

For smaller projects, a flat structure might suffice, clearly separating concerns into different Python modules:

my_llm_app/
├── app.py           # Main application logic or web server (e.g., Flask)
├── config.py        # Loads configuration (API keys via env vars, model names)
├── llm_client.py    # Functions/Class for LLM API interaction
├── prompt_utils.py  # Helper functions for loading/formatting prompts
├── prompts/         # Directory for storing prompt template files
│   ├── summarize.txt
│   └── qa_cot.txt
├── utils.py         # General utility functions (e.g., output parsing)
├── .env             # Environment variables (add to .gitignore!)
└── requirements.txt # Project dependencies

More Complex Application Structure:

As applications grow, especially when incorporating frameworks like LangChain or involving multiple distinct features (like Q&A, summarization, RAG), a more layered or feature-based structure is beneficial:

advanced_llm_app/
├── main.py               # Main entry point (e.g., starts web server or CLI)
├── core/                 # Core shared components
│   ├── __init__.py
│   ├── config.py         # Configuration loading (env vars, files)
│   ├── llm_interface.py  # Abstracted LLM interaction logic
│   ├── prompt_manager.py # Centralized prompt loading/templating
│   └── output_parser.py  # Shared output parsing utilities
├── modules/              # Application features/modules
│   ├── __init__.py
│   ├── qa/               # Question-Answering module
│   │   ├── __init__.py
│   │   ├── chain.py      # Logic specific to Q&A (e.g., LangChain chain)
│   │   └── prompts/      # Prompts specific to Q&A
│   │       └── retrieval_qa.yaml
│   ├── summarization/    # Summarization module
│   │   ├── __init__.py
│   │   ├── service.py    # Summarization specific logic
│   │   └── prompts/      # Prompts specific to summarization
│   │       └── condense_document.txt
│   └── rag/              # RAG components (if used)
│       ├── __init__.py
│       ├── retriever.py
│       └── vector_store.py
├── shared/               # Shared utilities not core to LLM interaction
│   └── data_models.py    # Pydantic models for validation
├── tests/                # Unit and integration tests
│   ├── core/
│   └── modules/
├── .env                  # Environment variables
└── requirements.txt

Visualizing the dependencies in a more complex structure can help understand the flow of information:

Dependency flow in a structured LLM application. Feature modules utilize core services, which handle configuration and direct LLM interactions.

Designing Modular Components

Beyond directory structure, think about designing reusable code components (functions, classes, or modules).

LLM Client Abstraction: Instead of scattering API calls, create a dedicated class or module (like llm_client.py or core/llm_interface.py). This component can encapsulate details like:
- Choosing the right API endpoint.
- Adding authentication headers.
- Implementing retry logic for transient network errors or rate limits.
- Standardizing how parameters like model, temperature, and max_tokens are passed.
- Basic logging of requests and responses.
Prompt Management: Implement a PromptManager class or utility functions that can:
- Load prompt templates from files (e.g., .txt, .json, .yaml).
- Handle placeholder replacement using templating engines (like Jinja2 or f-strings).
- Potentially cache loaded prompts.
Framework Components: If using frameworks like LangChain, leverage their built-in abstractions. LangChain's LLM wrappers, PromptTemplate, OutputParser, Chain, Agent, and Retriever classes inherently encourage a modular design. Structure your code around these components.

Managing Configuration Effectively

Hardcoding API keys, model names, or file paths directly into your application code is fragile and insecure. Externalize configuration using:

Environment Variables: This is the standard practice for sensitive information like API keys. Use libraries like python-dotenv to load variables from a .env file during local development (ensure .env is added to your .gitignore). In deployed environments, these variables are typically set through the hosting platform.
```
# config.py
import os
from dotenv import load_dotenv

load_dotenv() # Load variables from .env file

API_KEY = os.getenv("OPENAI_API_KEY")
MODEL_NAME = os.getenv("DEFAULT_MODEL", "gpt-3.5-turbo")
```

Configuration Files: For non-sensitive settings like model parameters, prompts paths, or application behavior flags, configuration files (e.g., config.yaml, settings.toml) are often clearer. Libraries like PyYAML, tomli, or hydra-core can help load and manage these.

# config.yaml
llm:
  default_model: "claude-3-sonnet-20240229"
  temperature: 0.7
  max_tokens: 500
paths:
  prompts_dir: "./prompts"
features:
  enable_rag: true

# config.py (using PyYAML)
import yaml
import os

def load_config(path="config.yaml"):
    with open(path, 'r') as f:
        config = yaml.safe_load(f)
    # Allow overriding with environment variables if needed
    config["llm"]["api_key"] = os.getenv("ANTHROPIC_API_KEY")
    return config

CONFIG = load_config()
MODEL_NAME = CONFIG.get("llm", {}).get("default_model", "unknown-model")

Combining these methods (e.g., loading defaults from a file and overriding with environment variables) provides flexibility.

Isolating and Managing Prompts

Prompts define your LLM's behavior. Treat them as first-class citizens in your project structure:

Store Externally: Keep prompts out of your Python code. Store them in separate files (e.g., in a dedicated /prompts directory). Use clear naming conventions.

Use Templating: Employ templating engines for prompts that require dynamic input. This makes the structure clear and separates the static instruction part from the variable data. Python's f-strings, Jinja2, or framework-specific template classes (like LangChain's PromptTemplate) are good choices.

# prompts/summarize_template.j2
Summarize the following text in {{ target_sentences }} sentences:

{{ text_to_summarize }}

Summary:

# prompt_utils.py
from jinja2 import Environment, FileSystemLoader
import os

PROMPTS_DIR = os.path.join(os.path.dirname(__file__), 'prompts')
env = Environment(loader=FileSystemLoader(PROMPTS_DIR))

def get_prompt(template_name, **kwargs):
    template = env.get_template(template_name)
    return template.render(**kwargs)

# Usage elsewhere
# from prompt_utils import get_prompt
# my_prompt = get_prompt("summarize_template.j2", target_sentences=3, text_to_summarize=user_input)

Consider Versioning: Just like code, prompts evolve. Using separate files makes it easier to track changes using version control systems like Git (as mentioned in Chapter 3).

Integrating Logging and Error Handling

A well-structured application makes it easier to add robust logging and error handling:

Centralized Logging: Configure Python's built-in logging module early on. Log important events, decisions, inputs/outputs (potentially sanitized), and errors. Structured logging (e.g., JSON format) can be beneficial for later analysis.
Specific Exception Handling: Use try...except blocks strategically, especially around external calls (LLM APIs, database lookups) and data parsing/validation. Define custom exception classes if needed to differentiate application-specific errors.
Isolate Failure Points: Modular design helps contain the impact of errors. An error in the LLM interaction module shouldn't necessarily crash the entire application if handled correctly.

Structuring your LLM application code thoughtfully from the beginning lays a solid foundation. It improves clarity and makes implementing the subsequent considerations in this chapter, such as securing API keys, monitoring costs, effective testing, and eventual deployment, a much more manageable process.

Was this section helpful?