As your Large Language Model applications evolve from simple scripts into more complex systems, the way you organize your code becomes increasingly important. Just as in traditional software development, a well-structured codebase is easier to understand, maintain, test, and extend. This is especially true for LLM applications, which often involve unique components like prompt templates, interaction logic with external APIs, and specific data handling for model inputs and outputs. Adopting good structural practices early on will save significant effort down the line, particularly when collaborating with others or preparing for deployment.
A fundamental principle in software design is the separation of concerns. This means that different parts of your application should be responsible for distinct functionalities. Applying this to LLM applications helps manage complexity. Consider isolating these common concerns:
A logical directory structure makes navigating and understanding your project much easier. The ideal structure depends on the complexity of your application, but here are a couple of common patterns:
Simple Application Structure:
For smaller projects, a flat structure might suffice, clearly separating concerns into different Python modules:
my_llm_app/
├── app.py # Main application logic or web server (e.g., Flask)
├── config.py # Loads configuration (API keys via env vars, model names)
├── llm_client.py # Functions/Class for LLM API interaction
├── prompt_utils.py # Helper functions for loading/formatting prompts
├── prompts/ # Directory for storing prompt template files
│ ├── summarize.txt
│ └── qa_cot.txt
├── utils.py # General utility functions (e.g., output parsing)
├── .env # Environment variables (add to .gitignore!)
└── requirements.txt # Project dependencies
More Complex Application Structure:
As applications grow, especially when incorporating frameworks like LangChain or involving multiple distinct features (like Q&A, summarization, RAG), a more layered or feature-based structure is beneficial:
advanced_llm_app/
├── main.py # Main entry point (e.g., starts web server or CLI)
├── core/ # Core shared components
│ ├── __init__.py
│ ├── config.py # Configuration loading (env vars, files)
│ ├── llm_interface.py # Abstracted LLM interaction logic
│ ├── prompt_manager.py # Centralized prompt loading/templating
│ └── output_parser.py # Shared output parsing utilities
├── modules/ # Application features/modules
│ ├── __init__.py
│ ├── qa/ # Question-Answering module
│ │ ├── __init__.py
│ │ ├── chain.py # Logic specific to Q&A (e.g., LangChain chain)
│ │ └── prompts/ # Prompts specific to Q&A
│ │ └── retrieval_qa.yaml
│ ├── summarization/ # Summarization module
│ │ ├── __init__.py
│ │ ├── service.py # Summarization specific logic
│ │ └── prompts/ # Prompts specific to summarization
│ │ └── condense_document.txt
│ └── rag/ # RAG components (if used)
│ ├── __init__.py
│ ├── retriever.py
│ └── vector_store.py
├── shared/ # Shared utilities not core to LLM interaction
│ └── data_models.py # Pydantic models for validation
├── tests/ # Unit and integration tests
│ ├── core/
│ └── modules/
├── .env # Environment variables
└── requirements.txt
Visualizing the dependencies in a more complex structure can help understand the flow of information:
Dependency flow in a structured LLM application. Feature modules utilize core services, which handle configuration and direct LLM interactions.
Beyond directory structure, think about designing reusable code components (functions, classes, or modules).
llm_client.py
or core/llm_interface.py
). This component can encapsulate details like:
model
, temperature
, and max_tokens
are passed.PromptManager
class or utility functions that can:
.txt
, .json
, .yaml
).LLM
wrappers, PromptTemplate
, OutputParser
, Chain
, Agent
, and Retriever
classes inherently encourage a modular design. Structure your code around these components.Hardcoding API keys, model names, or file paths directly into your application code is fragile and insecure. Externalize configuration using:
python-dotenv
to load variables from a .env
file during local development (ensure .env
is added to your .gitignore
). In deployed environments, these variables are typically set through the hosting platform.
# config.py
import os
from dotenv import load_dotenv
load_dotenv() # Load variables from .env file
API_KEY = os.getenv("OPENAI_API_KEY")
MODEL_NAME = os.getenv("DEFAULT_MODEL", "gpt-3.5-turbo")
config.yaml
, settings.toml
) are often clearer. Libraries like PyYAML
, tomli
, or hydra-core
can help load and manage these.
# config.yaml
llm:
default_model: "claude-3-sonnet-20240229"
temperature: 0.7
max_tokens: 500
paths:
prompts_dir: "./prompts"
features:
enable_rag: true
# config.py (using PyYAML)
import yaml
import os
def load_config(path="config.yaml"):
with open(path, 'r') as f:
config = yaml.safe_load(f)
# Allow overriding with environment variables if needed
config["llm"]["api_key"] = os.getenv("ANTHROPIC_API_KEY")
return config
CONFIG = load_config()
MODEL_NAME = CONFIG.get("llm", {}).get("default_model", "unknown-model")
Combining these methods (e.g., loading defaults from a file and overriding with environment variables) provides flexibility.
Prompts define your LLM's behavior. Treat them as first-class citizens in your project structure:
/prompts
directory). Use clear naming conventions.PromptTemplate
) are good choices.
# prompts/summarize_template.j2
Summarize the following text in {{ target_sentences }} sentences:
{{ text_to_summarize }}
Summary:
# prompt_utils.py
from jinja2 import Environment, FileSystemLoader
import os
PROMPTS_DIR = os.path.join(os.path.dirname(__file__), 'prompts')
env = Environment(loader=FileSystemLoader(PROMPTS_DIR))
def get_prompt(template_name, **kwargs):
template = env.get_template(template_name)
return template.render(**kwargs)
# Usage elsewhere
# from prompt_utils import get_prompt
# my_prompt = get_prompt("summarize_template.j2", target_sentences=3, text_to_summarize=user_input)
A well-structured application makes it easier to add robust logging and error handling:
logging
module early on. Log important events, decisions, inputs/outputs (potentially sanitized), and errors. Structured logging (e.g., JSON format) can be beneficial for later analysis.try...except
blocks strategically, especially around external calls (LLM APIs, database lookups) and data parsing/validation. Define custom exception classes if needed to differentiate application-specific errors.Structuring your LLM application code thoughtfully from the beginning lays a solid foundation. It improves clarity and makes implementing the subsequent considerations in this chapter, such as securing API keys, monitoring costs, effective testing, and eventual deployment, a much more manageable process.
© 2025 ApX Machine Learning