This hands-on exercise focuses on building a practical Python tool: a database query tool. This tool allows an LLM agent to retrieve information from a relational database, a common requirement for agents needing access to structured data. The primary focus involves creating a Python function that connects to an SQLite database, executes a query, and returns results in an LLM-friendly format. This exercise covers defining tool interfaces, handling inputs and outputs, and interacting with an external data source.Scenario: Product Information AgentImagine an LLM agent designed to answer customer questions about products. The product information (names, categories, prices) is stored in a database. Our goal is to create a tool that the agent can use to query this database.Prerequisites: Setting up the DatabaseFirst, we need a simple database to work with. We'll use SQLite, which is built into Python, making setup straightforward. Let's create a small database named product_catalog.db with a products table.Create a Python script (e.g., setup_db.py) with the following content and run it once to create and populate your database:import sqlite3 def initialize_database(): """Creates and populates the product_catalog.db if it doesn't exist.""" db_path = 'product_catalog.db' conn = sqlite3.connect(db_path) cursor = conn.cursor() # Create products table cursor.execute(''' CREATE TABLE IF NOT EXISTS products ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, category TEXT, price REAL, stock_quantity INTEGER ) ''') # Check if table is empty before inserting sample data cursor.execute("SELECT COUNT(*) FROM products") if cursor.fetchone()[0] == 0: sample_data = [ ('Laptop Pro', 'Electronics', 1299.99, 50), ('Wireless Mouse', 'Electronics', 25.50, 200), ('Coffee Maker', 'Appliances', 79.00, 75), ('Office Chair', 'Furniture', 150.75, 30), ('Python Programming Book', 'Books', 45.00, 120) ] cursor.executemany(''' INSERT INTO products (name, category, price, stock_quantity) VALUES (?, ?, ?, ?) ''', sample_data) print("Database initialized and populated with sample data.") else: print("Database already exists and contains data.") conn.commit() conn.close() if __name__ == '__main__': initialize_database()After running this script, you'll have a product_catalog.db file in the same directory.Designing the Database Query ToolOur tool needs to:Accept a SQL query and optional parameters from the LLM. For safety and simplicity in this exercise, we will restrict the tool to execute only SELECT statements.Connect to the product_catalog.db.Execute the query using parameterized inputs to prevent SQL injection.Fetch the results.Format the results as a JSON string, which is easily parsable by LLMs.Handle potential errors gracefully.Implementing the ToolLet's implement this tool as a Python function.import sqlite3 import json DB_PATH = 'product_catalog.db' # Define database path as a constant def query_product_database(sql_query: str, parameters: tuple = ()) -> str: """ Executes a read-only SQL query (SELECT) against the product_catalog.db database and returns the results. Args: sql_query (str): The SELECT SQL query to execute. Only SELECT statements are permitted. parameters (tuple): A tuple of parameters to safely substitute into the query. Use this for dynamic values in WHERE clauses. Example: ("Electronics",) for "WHERE category = ?" Returns: str: A JSON string representing the query results (list of dictionaries) or an error message if the query fails or is not allowed. """ # Input validation: Ensure it's a SELECT query if not sql_query.strip().upper().startswith("SELECT"): return json.dumps({ "error": "Invalid operation. Only SELECT queries are allowed for this tool." }) try: conn = sqlite3.connect(DB_PATH) cursor = conn.cursor() cursor.execute(sql_query, parameters) rows = cursor.fetchall() if not rows: return json.dumps({ "message": "Query executed successfully. No records found matching your criteria." }) # Get column names from cursor.description column_names = [description[0] for description in cursor.description] # Convert list of tuples to list of dictionaries results = [dict(zip(column_names, row)) for row in rows] return json.dumps(results, indent=2) # indent for readability, optional for LLM except sqlite3.IntegrityError as e: # Example of specific DB error return json.dumps({ "error": f"Database integrity error: {str(e)}. Check your query parameters and table constraints." }) except sqlite3.OperationalError as e: # Example for table not found, syntax errors return json.dumps({ "error": f"Database operational error: {str(e)}. Check your SQL syntax or if the table exists." }) except sqlite3.Error as e: # Catch other sqlite3 specific errors return json.dumps({ "error": f"A database error occurred: {str(e)}" }) except Exception as e: # Catch any other unexpected errors return json.dumps({ "error": f"An unexpected error occurred during query execution: {str(e)}" }) finally: if 'conn' in locals() and conn: conn.close() Aspects of this implementation:Docstring: The docstring is detailed. This is important because LLM agent frameworks often use docstrings to understand what a tool does and how to use its parameters.Type Hinting: sql_query: str, parameters: tuple, -> str improves code readability and can be used by agent frameworks.Input Validation: We explicitly check if the query starts with SELECT. This is a simple but important security measure for a tool that executes arbitrary SQL. In more advanced scenarios, you might parse the SQL more deeply or use a query builder.Parameterized Queries: cursor.execute(sql_query, parameters) is used. This is the standard way to prevent SQL injection vulnerabilities. The LLM would be instructed to provide the query structure and the values separately.Result Formatting: Results are converted into a list of dictionaries, where each dictionary represents a row and keys are column names. This structure is then serialized to a JSON string, a format LLMs can easily process.Error Handling: try...except blocks are used to catch various database-related errors and other exceptions, returning a JSON object with an error message. This provides useful feedback to the LLM if something goes wrong.Connection Management: The database connection is closed in a finally block to ensure it's always released, even if errors occur.Testing the Tool DirectlyBefore integrating this tool with an LLM agent, it's good practice to test it directly. You can add the following to your Python script to see it in action:if __name__ == '__main__': print("Ensure 'product_catalog.db' exists. Run 'setup_db.py' if not.\n") # Test Case 1: Successful query for products in 'Electronics' print("Test Case 1: Products in Electronics category") query1 = "SELECT name, price FROM products WHERE category = ? ORDER BY price DESC" params1 = ('Electronics',) result1 = query_product_database(query1, params1) print(json.dumps(json.loads(result1), indent=2)) # Pretty print the JSON output print("-" * 40) # Test Case 2: Successful query for a specific product print("Test Case 2: Specific product 'Office Chair'") query2 = "SELECT name, stock_quantity FROM products WHERE name = ?" params2 = ('Office Chair',) result2 = query_product_database(query2, params2) print(json.dumps(json.loads(result2), indent=2)) print("-" * 40) # Test Case 3: Query that returns no results print("Test Case 3: Non-existent category 'Toys'") query3 = "SELECT name, price FROM products WHERE category = ?" params3 = ('Toys',) result3 = query_product_database(query3, params3) print(json.dumps(json.loads(result3), indent=2)) print("-" * 40) # Test Case 4: Attempting a non-SELECT query (should be blocked) print("Test Case 4: Attempting an UPDATE query (security check)") query4 = "UPDATE products SET price = 10.00 WHERE name = ?" params4 = ('Wireless Mouse',) result4 = query_product_database(query4, params4) print(json.dumps(json.loads(result4), indent=2)) print("-" * 40) # Test Case 5: Query with a syntax error print("Test Case 5: Query with SQL syntax error") query5 = "SELEC name, price FROM products WHERE category = ?" # Deliberate typo in SELECT params5 = ('Books',) result5 = query_product_database(query5, params5) print(json.dumps(json.loads(result5), indent=2)) print("-" * 40) # Test Case 6: Query a non-existent column print("Test Case 6: Querying a non-existent column") query6 = "SELECT non_existent_column FROM products WHERE category = ?" params6 = ('Books',) result6 = query_product_database(query6, params6) print(json.dumps(json.loads(result6), indent=2)) print("-" * 40)Running these tests will help you verify that the tool behaves as expected in various scenarios, including successful queries, queries with no results, and error conditions.Integrating with an LLM AgentOnce the tool is developed and tested, it can be registered with an LLM agent framework (like LangChain, LlamaIndex, or a custom agent). The agent, when faced with a task requiring product information, would:Identify the need for the tool: Based on the user's query (e.g., "What electronics do you have under $50?") and the description of available tools, the LLM would decide that query_product_database is appropriate.Formulate the arguments: The LLM would generate the sql_query string and the parameters tuple. For the example "What electronics do you have under $50?", it might generate:sql_query: "SELECT name, price FROM products WHERE category = ? AND price < ?"parameters: ('Electronics', 50.0)Invoke the tool: The agent framework calls our query_product_database function with these arguments.Receive and process results: The JSON string returned by the tool is parsed by the LLM, which then uses the information to formulate an answer to the user.The following diagram illustrates this interaction flow:digraph G { rankdir=TB; graph [fontname="Arial", fontsize=10]; node [shape=box, style="rounded,filled", fillcolor="#e9ecef", fontname="Arial", fontsize=10]; edge [fontname="Arial", fontsize=9]; LLMAgent [label="LLM Agent", fillcolor="#a5d8ff", shape=oval]; ToolRouter [label="Tool Selection Logic", fillcolor="#ffec99"]; DBTool [label="query_product_database Tool\n(Python Function)", fillcolor="#96f2d7", peripheries=2]; ProductDB [label="Product Database\n(product_catalog.db)", shape=cylinder, fillcolor="#bac8ff"]; User [label="User Query", shape=cds, fillcolor="#ffc9c9"]; LLMResponse [label="LLM Generated Response", shape=cds, fillcolor="#ffc9c9"]; User -> LLMAgent [label="e.g., 'Find cheap electronics'"]; LLMAgent -> ToolRouter [label="Determines tool needed"]; ToolRouter -> DBTool [label="Invokes with SQL:\n'SELECT ... WHERE category=? AND price<?'\nParams: ('Electronics', 50)"]; DBTool -> ProductDB [label="Executes query"]; ProductDB -> DBTool [label="Returns rows"]; DBTool -> LLMAgent [label="Returns JSON results:\n[{'name':'Mouse', 'price':25.50}]"]; LLMAgent -> LLMResponse [label="Formats answer for user"]; }Flow of an LLM agent using the query_product_database tool to answer a user's question.This hands-on exercise demonstrates how to build a fundamental type of tool that grants LLM agents access to structured data. By defining clear interfaces, handling inputs and outputs carefully, and ensuring error management, you can create powerful and reliable tools that significantly expand the capabilities of your agents. As you build more complex agents, you'll often find yourself creating variations of such database interaction tools tailored to different data sources and requirements.