This section provides a hands-on exercise to consolidate the concepts of multi-step planning and tool integration discussed in this chapter. We will construct a basic agent capable of decomposing a task, utilizing external tools (a simple search function and a calculator), and sequencing actions to achieve a goal. This exercise assumes familiarity with Python programming and interacting with LLM APIs.Our objective is to build an agent that can answer questions requiring information retrieval and calculation, like "What was the approximate population difference between the host cities of the 1992 and 2000 Summer Olympics?"1. Defining the ToolsEffective tool use begins with clear definitions that the LLM can understand. Each tool needs a name, a description outlining its purpose and when to use it, and the expected input format.Let's define two simple tools in Python:import re # Placeholder for a real search API def simple_search(query: str) -> str: """ A simple search tool. Use this to find information about specific entities, events, or facts. Input should be a concise search query string. Example queries: 'capital of Japan', '1992 Summer Olympics host city' """ print(f"Executing Search: {query}") # In a real scenario, this would call a search API (e.g., Google Search, Bing). # We'll use hardcoded responses for this example. query = query.lower() if "1992 summer olympics host city" in query: return "Barcelona" elif "2000 summer olympics host city" in query: return "Sydney" elif "population of barcelona" in query: return "Approximately 1.6 million" elif "population of sydney" in query: return "Approximately 5.3 million" else: return "Information not found." def simple_calculator(expression: str) -> str: """ A simple calculator tool. Use this to perform arithmetic calculations. Input must be a valid mathematical expression string (e.g., '5.3 - 1.6'). It handles addition (+), subtraction (-), multiplication (*), and division (/). """ print(f"Executing Calculation: {expression}") try: # Basic security: Allow only numbers, operators, and spaces. if not re.match(r"^[0-9\.\s\+\-\*\/\(\)]+$", expression): return "Error: Invalid characters in expression." # Evaluate the expression. Note: eval() is used here for simplicity, # but can be insecure. Use a safer math expression parser in production. result = eval(expression) return f"{result:.2f}" # Format to two decimal places except Exception as e: return f"Error: Calculation failed. {str(e)}" # Store tools in a dictionary for easy lookup tools = { "Search": simple_search, "Calculator": simple_calculator } # Generate tool descriptions for the LLM prompt tool_descriptions = "" for name, func in tools.items(): tool_descriptions += f"- {name}: {func.__doc__.strip()}\n" print("Tool Descriptions for Prompt:\n", tool_descriptions)The tool_descriptions string is significant. It will be part of the prompt, informing the LLM about the available capabilities.2. Designing the Agent LoopWe'll implement a variation of the ReAct (Reason + Act) pattern. The agent will operate in a loop, reasoning about the next step, choosing an action (either using a tool or formulating a final answer), and observing the result.The core logic looks like this:Initialize: Start with the user's goal and an empty history.Reason/Plan: Send the goal and history to the LLM. Prompt it to think step-by-step, decide the next action, or provide the final answer. The prompt must include the tool descriptions.Parse Action: Extract the intended action from the LLM's response. This could be:Action: ToolName[Input] (e.g., Action: Search[1992 Summer Olympics host city])Final Answer: [Answer]Execute Action:If it's a tool action, validate the tool name and input. Execute the corresponding function.If it's the final answer, terminate the loop.Observe: Record the result of the action (tool output or error message).Update History: Append the reasoning, action, and observation to the history.Repeat: Go back to Step 2 unless a final answer was given or a step limit is reached.3. Implementing the AgentLet's sketch out the Python code for this agent loop. We'll need a function to interact with an LLM (represented here by a placeholder call_llm).# Placeholder for your LLM API call function # Assume it takes a prompt string and returns the LLM's text response. def call_llm(prompt: str) -> str: # Replace with actual OpenAI, Anthropic, Gemini, etc. API call print("\n--- Sending Prompt to LLM ---") # Truncate prompt for display if too long print(prompt[:1000] + "..." if len(prompt) > 1000 else prompt) print("--- End Prompt ---") # Dummy responses simulating LLM behavior for the example task # This sequence simulates the reasoning and action generation if "Initial Goal" in prompt: response = """ Thought: The user wants the population difference between the host cities of the 1992 and 2000 Olympics. I need to find the host city for 1992, find its population. Then find the host city for 2000, find its population. Finally, calculate the difference. Step 1: Find the host city for 1992. Action: Search[1992 Summer Olympics host city] """ elif "Observation: Barcelona" in prompt: response = """ Thought: OK, the 1992 host city is Barcelona. Now I need its population. Action: Search[population of Barcelona] """ elif "Observation: Approximately 1.6 million" in prompt: response = """ Thought: Barcelona's population is ~1.6 million. Now find the 2000 host city. Action: Search[2000 Summer Olympics host city] """ elif "Observation: Sydney" in prompt: response = """ Thought: The 2000 host city is Sydney. Now I need its population. Action: Search[population of Sydney] """ elif "Observation: Approximately 5.3 million" in prompt: response = """ Thought: Sydney's population is ~5.3 million. Now I have both populations (5.3M for Sydney, 1.6M for Barcelona). I need to calculate the difference. Action: Calculator[5.3 - 1.6] """ elif "Observation: 3.70" in prompt: response = """ Thought: The calculator returned 3.70. This represents 3.7 million. I now have the final answer. Final Answer: The approximate population difference between Sydney (2000 host, ~5.3 million) and Barcelona (1992 host, ~1.6 million) is 3.7 million. """ else: response = "Final Answer: I encountered an unexpected state." print("\n--- LLM Response ---") print(response) print("--- End Response ---") return response.strip() def parse_llm_output(response: str) -> tuple[str, str, str]: """Parses LLM response to find Thought, Action, and Final Answer.""" thought_match = re.search(r"Thought:(.*)", response, re.DOTALL | re.IGNORECASE) action_match = re.search(r"Action:\s*(\w+)\s*\[(.*)\]", response, re.DOTALL | re.IGNORECASE) final_answer_match = re.search(r"Final Answer:(.*)", response, re.DOTALL | re.IGNORECASE) thought = thought_match.group(1).strip() if thought_match else "" if action_match: tool_name = action_match.group(1).strip() tool_input = action_match.group(2).strip() return thought, tool_name, tool_input elif final_answer_match: final_answer = final_answer_match.group(1).strip() # Indicate final answer by returning None for tool name/input return thought, "Final Answer", final_answer else: # If no specific action or final answer, assume it's part of reasoning or unexpected return thought, "No Action", "" def run_agent(initial_goal: str, max_steps: int = 10): """Runs the multi-step planning agent.""" history = f"Initial Goal: {initial_goal}\n" for step in range(max_steps): print(f"\n--- Step {step + 1} ---") prompt = f""" You are an expert assistant designed to answer questions by planning steps and using available tools. Think step-by-step to break down the goal. You have access to the following tools: {tool_descriptions} Use the format: Thought: [Your reasoning process] Action: [ToolName][Input] or, if you have the final answer: Thought: [Your reasoning process] Final Answer: [The final answer] Current Goal: {initial_goal} Conversation History: {history} Your turn: """ llm_response = call_llm(prompt) thought, tool_name, tool_input_or_answer = parse_llm_output(llm_response) history += f"Thought: {thought}\n" if tool_name == "Final Answer": final_answer = tool_input_or_answer print(f"\nFinal Answer Received: {final_answer}") history += f"Final Answer: {final_answer}\n" return final_answer, history elif tool_name == "No Action": print("Agent decided no action was needed or output format was unexpected.") # Potentially add fallback logic or ask for clarification history += "Observation: Agent provided reasoning but no specific action or final answer.\n" # For this example, we'll just stop if stuck return "Agent stopped: No clear next action.", history elif tool_name in tools: print(f"Action: Using tool '{tool_name}' with input '{tool_input_or_answer}'") history += f"Action: {tool_name}[{tool_input_or_answer}]\n" try: tool_function = tools[tool_name] observation = tool_function(tool_input_or_answer) print(f"Observation: {observation}") history += f"Observation: {observation}\n" except Exception as e: print(f"Error executing tool {tool_name}: {e}") history += f"Observation: Error executing tool {tool_name}: {str(e)}\n" else: print(f"Error: Unknown tool '{tool_name}' requested.") history += f"Observation: Attempted to use unknown tool '{tool_name}'.\n" if step == max_steps - 1: print("Maximum steps reached.") return "Agent stopped: Max steps reached.", history return "Agent stopped unexpectedly.", history # --- Run the Example --- goal = "What was the approximate population difference between the host cities of the 1992 and 2000 Summer Olympics?" final_answer, execution_history = run_agent(goal) print("\n--- Execution History ---") print(execution_history)4. Example Trace and VisualizationRunning the code with the example goal produces a sequence of interactions. The agent first uses the Search tool to find the host cities (Barcelona, Sydney), then uses Search again for their populations, and finally uses the Calculator tool to find the difference.We can visualize the planned execution flow:digraph G { rankdir=TB; node [shape=box, style=rounded, fontname="sans-serif", color="#495057", fillcolor="#e9ecef", style=filled]; edge [color="#868e96"]; Start [label="Goal:\nPopulation difference\n1992 vs 2000 hosts"]; Find1992Host [label="Search:\n1992 Host City"]; Find1992Pop [label="Search:\nPopulation of\n(Result of Find1992Host)"]; Find2000Host [label="Search:\n2000 Host City"]; Find2000Pop [label="Search:\nPopulation of\n(Result of Find2000Host)"]; CalculateDiff [label="Calculator:\n(Pop 2000) - (Pop 1992)"]; End [label="Final Answer", shape=ellipse, fillcolor="#d8f5a2"]; Start -> Find1992Host; Start -> Find2000Host; Find1992Host -> Find1992Pop; Find2000Host -> Find2000Pop; Find1992Pop -> CalculateDiff; Find2000Pop -> CalculateDiff; CalculateDiff -> End; }This diagram illustrates the sequence of tool uses planned by the agent to fulfill the request. Each box represents an action, typically involving a tool call, leading towards the final answer.5. Handling Errors and Self-CorrectionOur simple implementation includes basic error reporting (e.g., "Information not found", "Error: Calculation failed"). In a more sophisticated agent, these error observations are fed back into the LLM's context in the next step.Consider if the Search[population of Barcelona] failed. The history would include Observation: Information not found.. The LLM, seeing this, should ideally adapt its plan. It might:Retry: Attempt a slightly different query, e.g., Search[Barcelona city population].Seek Alternatives: Try to find the information through another means if available.Report Failure: Conclude that it cannot find the required information and report this in the final answer.Implementing self-correction requires careful prompt engineering, potentially giving the LLM explicit instructions on how to handle errors or ambiguous tool outputs. Techniques like reflection, where the agent critiques its own plan or output based on observations, can also be integrated.6. Further DevelopmentThis example provides a foundational structure. Expert-level systems often incorporate more advanced techniques:Hierarchical Planning: Breaking down the primary goal into sub-goals, each potentially requiring its own sub-plan and tool usage. This is essential for very complex tasks.Dynamic Tool Selection: Using the LLM not just to format the tool input but also to select the most appropriate tool from a larger set based on the current sub-task.Sophisticated Error Handling: Implementing retry logic with backoff, parsing specific error codes, or even invoking a debugging tool or sub-agent.State Management: Explicitly managing the agent's internal state, potentially using structured formats or memory modules discussed in Chapter 3.Structured Output: Using LLMs that support constrained decoding or function calling to get more reliable JSON outputs for tool actions, reducing parsing errors."Frameworks like LangChain, LlamaIndex, or AutoGen provide higher-level abstractions for building such agents, managing prompts, tool definitions, parsing, and execution loops. However, understanding the underlying mechanisms, as practiced here, is important for debugging, optimizing, and customizing agent behavior for complex applications."