This section provides a hands-on exercise to consolidate the concepts of multi-step planning and tool integration discussed in this chapter. We will construct a basic agent capable of decomposing a task, utilizing external tools (a simple search function and a calculator), and sequencing actions to achieve a goal. This exercise assumes familiarity with Python programming and interacting with LLM APIs.
Our objective is to build an agent that can answer questions requiring information retrieval and calculation, like "What was the approximate population difference between the host cities of the 1992 and 2000 Summer Olympics?"
Effective tool use begins with clear definitions that the LLM can understand. Each tool needs a name, a description outlining its purpose and when to use it, and the expected input format.
Let's define two simple tools in Python:
import re
# Placeholder for a real search API
def simple_search(query: str) -> str:
"""
A simple search tool.
Use this to find information about specific entities, events, or facts.
Input should be a concise search query string.
Example queries: 'capital of Japan', '1992 Summer Olympics host city'
"""
print(f"Executing Search: {query}")
# In a real scenario, this would call a search API (e.g., Google Search, Bing).
# We'll use hardcoded responses for this example.
query = query.lower()
if "1992 summer olympics host city" in query:
return "Barcelona"
elif "2000 summer olympics host city" in query:
return "Sydney"
elif "population of barcelona" in query:
return "Approximately 1.6 million"
elif "population of sydney" in query:
return "Approximately 5.3 million"
else:
return "Information not found."
def simple_calculator(expression: str) -> str:
"""
A simple calculator tool.
Use this to perform arithmetic calculations.
Input must be a valid mathematical expression string (e.g., '5.3 - 1.6').
It handles addition (+), subtraction (-), multiplication (*), and division (/).
"""
print(f"Executing Calculation: {expression}")
try:
# Basic security: Allow only numbers, operators, and spaces.
if not re.match(r"^[0-9\.\s\+\-\*\/\(\)]+$", expression):
return "Error: Invalid characters in expression."
# Evaluate the expression. Note: eval() is used here for simplicity,
# but can be insecure. Use a safer math expression parser in production.
result = eval(expression)
return f"{result:.2f}" # Format to two decimal places
except Exception as e:
return f"Error: Calculation failed. {str(e)}"
# Store tools in a dictionary for easy lookup
tools = {
"Search": simple_search,
"Calculator": simple_calculator
}
# Generate tool descriptions for the LLM prompt
tool_descriptions = ""
for name, func in tools.items():
tool_descriptions += f"- {name}: {func.__doc__.strip()}\n"
print("Tool Descriptions for Prompt:\n", tool_descriptions)
The tool_descriptions
string is significant. It will be part of the prompt, informing the LLM about the available capabilities.
We'll implement a variation of the ReAct (Reason + Act) pattern. The agent will operate in a loop, reasoning about the next step, choosing an action (either using a tool or formulating a final answer), and observing the result.
The core logic looks like this:
Action: ToolName[Input]
(e.g., Action: Search[1992 Summer Olympics host city]
)Final Answer: [Answer]
Let's sketch out the Python code for this agent loop. We'll need a function to interact with an LLM (represented here by a placeholder call_llm
).
# Placeholder for your LLM API call function
# Assume it takes a prompt string and returns the LLM's text response.
def call_llm(prompt: str) -> str:
# Replace with actual OpenAI, Anthropic, Gemini, etc. API call
print("\n--- Sending Prompt to LLM ---")
# Truncate prompt for display if too long
print(prompt[:1000] + "..." if len(prompt) > 1000 else prompt)
print("--- End Prompt ---")
# Dummy responses simulating LLM behavior for the example task
# This sequence simulates the reasoning and action generation
if "Initial Goal" in prompt:
response = """
Thought: The user wants the population difference between the host cities of the 1992 and 2000 Olympics.
I need to find the host city for 1992, find its population.
Then find the host city for 2000, find its population.
Finally, calculate the difference.
Step 1: Find the host city for 1992.
Action: Search[1992 Summer Olympics host city]
"""
elif "Observation: Barcelona" in prompt:
response = """
Thought: OK, the 1992 host city is Barcelona. Now I need its population.
Action: Search[population of Barcelona]
"""
elif "Observation: Approximately 1.6 million" in prompt:
response = """
Thought: Barcelona's population is ~1.6 million. Now find the 2000 host city.
Action: Search[2000 Summer Olympics host city]
"""
elif "Observation: Sydney" in prompt:
response = """
Thought: The 2000 host city is Sydney. Now I need its population.
Action: Search[population of Sydney]
"""
elif "Observation: Approximately 5.3 million" in prompt:
response = """
Thought: Sydney's population is ~5.3 million. Now I have both populations (5.3M for Sydney, 1.6M for Barcelona). I need to calculate the difference.
Action: Calculator[5.3 - 1.6]
"""
elif "Observation: 3.70" in prompt:
response = """
Thought: The calculator returned 3.70. This represents 3.7 million. I now have the final answer.
Final Answer: The approximate population difference between Sydney (2000 host, ~5.3 million) and Barcelona (1992 host, ~1.6 million) is 3.7 million.
"""
else:
response = "Final Answer: I encountered an unexpected state."
print("\n--- LLM Response ---")
print(response)
print("--- End Response ---")
return response.strip()
def parse_llm_output(response: str) -> tuple[str, str, str]:
"""Parses LLM response to find Thought, Action, and Final Answer."""
thought_match = re.search(r"Thought:(.*)", response, re.DOTALL | re.IGNORECASE)
action_match = re.search(r"Action:\s*(\w+)\s*\[(.*)\]", response, re.DOTALL | re.IGNORECASE)
final_answer_match = re.search(r"Final Answer:(.*)", response, re.DOTALL | re.IGNORECASE)
thought = thought_match.group(1).strip() if thought_match else ""
if action_match:
tool_name = action_match.group(1).strip()
tool_input = action_match.group(2).strip()
return thought, tool_name, tool_input
elif final_answer_match:
final_answer = final_answer_match.group(1).strip()
# Indicate final answer by returning None for tool name/input
return thought, "Final Answer", final_answer
else:
# If no specific action or final answer, assume it's part of reasoning or unexpected
return thought, "No Action", ""
def run_agent(initial_goal: str, max_steps: int = 10):
"""Runs the multi-step planning agent."""
history = f"Initial Goal: {initial_goal}\n"
for step in range(max_steps):
print(f"\n--- Step {step + 1} ---")
prompt = f"""
You are an expert assistant designed to answer questions by planning steps and using available tools.
Think step-by-step to break down the goal.
You have access to the following tools:
{tool_descriptions}
Use the format:
Thought: [Your reasoning process]
Action: [ToolName][Input]
or, if you have the final answer:
Thought: [Your reasoning process]
Final Answer: [The final answer]
Current Goal: {initial_goal}
Conversation History:
{history}
Your turn:
"""
llm_response = call_llm(prompt)
thought, tool_name, tool_input_or_answer = parse_llm_output(llm_response)
history += f"Thought: {thought}\n"
if tool_name == "Final Answer":
final_answer = tool_input_or_answer
print(f"\nFinal Answer Received: {final_answer}")
history += f"Final Answer: {final_answer}\n"
return final_answer, history
elif tool_name == "No Action":
print("Agent decided no action was needed or output format was unexpected.")
# Potentially add fallback logic or ask for clarification
history += "Observation: Agent provided reasoning but no specific action or final answer.\n"
# For this example, we'll just stop if stuck
return "Agent stopped: No clear next action.", history
elif tool_name in tools:
print(f"Action: Using tool '{tool_name}' with input '{tool_input_or_answer}'")
history += f"Action: {tool_name}[{tool_input_or_answer}]\n"
try:
tool_function = tools[tool_name]
observation = tool_function(tool_input_or_answer)
print(f"Observation: {observation}")
history += f"Observation: {observation}\n"
except Exception as e:
print(f"Error executing tool {tool_name}: {e}")
history += f"Observation: Error executing tool {tool_name}: {str(e)}\n"
else:
print(f"Error: Unknown tool '{tool_name}' requested.")
history += f"Observation: Attempted to use unknown tool '{tool_name}'.\n"
if step == max_steps - 1:
print("Maximum steps reached.")
return "Agent stopped: Max steps reached.", history
return "Agent stopped unexpectedly.", history
# --- Run the Example ---
goal = "What was the approximate population difference between the host cities of the 1992 and 2000 Summer Olympics?"
final_answer, execution_history = run_agent(goal)
print("\n--- Execution History ---")
print(execution_history)
Running the code with the example goal produces a sequence of interactions. The agent first uses the Search
tool to find the host cities (Barcelona, Sydney), then uses Search
again for their populations, and finally uses the Calculator
tool to find the difference.
We can visualize the planned execution flow:
This diagram illustrates the sequence of tool uses planned by the agent to fulfill the request. Each box represents an action, typically involving a tool call, leading towards the final answer.
Our simple implementation includes basic error reporting (e.g., "Information not found", "Error: Calculation failed"). In a more sophisticated agent, these error observations are fed back into the LLM's context in the next step.
Consider if the Search[population of Barcelona]
failed. The history would include Observation: Information not found.
. The LLM, seeing this, should ideally adapt its plan. It might:
Search[Barcelona city population]
.Implementing robust self-correction requires careful prompt engineering, potentially giving the LLM explicit instructions on how to handle errors or ambiguous tool outputs. Techniques like reflection, where the agent critiques its own plan or output based on observations, can also be integrated.
This example provides a foundational structure. Expert-level systems often incorporate more advanced techniques:
Frameworks like LangChain, LlamaIndex, or AutoGen provide higher-level abstractions for building such agents, managing prompts, tool definitions, parsing, and execution loops. However, understanding the underlying mechanisms, as practiced here, is important for debugging, optimizing, and customizing agent behavior for complex, real-world applications.
© 2025 ApX Machine Learning