Building a ReAct agent manually provides a thorough understanding of its core control flow and interaction patterns. This practical application demonstrates the operational mechanics, prompt engineering complexities, and difficulties in orchestrating the thought-action-observation cycle, which higher-level frameworks often abstract away.Our goal is not to replicate a full-featured library but to implement the essential ReAct logic. We assume you have access to an LLM API (like OpenAI's GPT models, Anthropic's Claude, or a self-hosted model) and proficiency in Python for making API calls and handling responses.Core Components of a Custom ReAct AgentImplementing a ReAct agent requires several interconnected components:LLM Interface: A function or class that handles communication with your chosen Large Language Model. It takes a formatted prompt string and returns the model's text generation. Error handling for API calls (timeouts, rate limits) is important for a functional implementation.Tool Set: A collection of available tools the agent can use. Each tool needs:A unique name (e.g., search, calculator).A clear description explaining what it does, its expected input format, and its output. This description is important for the LLM to understand when and how to use the tool.The actual implementation (e.g., a Python function) that executes the tool's logic.Prompt Template: The heart of the ReAct agent. This template structures the input to the LLM, guiding its reasoning process. It typically includes:The initial question or task.Instructions defining the ReAct format (Thought, Action, Observation).Descriptions of the available tools.A "scratchpad" area where the agent records its sequence of thoughts, actions, and observations from previous steps.A placeholder for the current step's thought process.Response Parser: Logic to parse the LLM's output string. It needs to reliably identify and extract:The Thought: The agent's reasoning for the next step.The Action: The tool to use and the input for that tool (e.g., Action: search[query: recent advancements in LLM agents]).The Final Answer: The concluding response when the agent believes the task is complete. Handling variations in LLM output formatting and potential parsing failures requires careful design (e.g., using regular expressions or requesting structured output like JSON if the LLM supports it).Execution Loop: The main control flow that orchestrates the ReAct process iteratively:Format the prompt using the template, current question, tool descriptions, and the accumulated scratchpad content.Send the prompt to the LLM.Parse the LLM's response.If a Final Answer is detected, terminate and return the answer.If an Action is parsed:Identify the tool name and input.Execute the corresponding tool function with the provided input. Handle potential execution errors.Format the tool's output (or error message) as an Observation.Append the Thought, Action, and Observation to the scratchpad.Continue to the next iteration.Implement termination conditions (e.g., maximum number of steps) to prevent infinite loops.Illustrative ReAct Cycle FlowThe interaction follows a distinct pattern, often visualized as a loop.digraph ReAct_Loop { rankdir=LR; node [shape=box, style=rounded, fontname="Arial", fontsize=10, margin=0.2]; edge [fontname="Arial", fontsize=9]; Start [label="Start\n(User Question)", shape=ellipse, style=filled, fillcolor="#a5d8ff"]; FormatPrompt [label="Format Prompt\n(Question + Tools + Scratchpad)"]; LLM [label="LLM Call", shape=cylinder, style=filled, fillcolor="#ffec99"]; ParseResponse [label="Parse Response\n(Thought, Action/Final Answer)"]; IsFinal [label="Final Answer?", shape=diamond, style=filled, fillcolor="#ffc9c9"]; ExecuteTool [label="Execute Tool\n(Action Input -> Tool Function)"]; FormatObs [label="Format Observation\n(Tool Output/Error)"]; UpdateScratchpad [label="Update Scratchpad\n(Thought, Action, Observation)"]; End [label="End\n(Return Final Answer)", shape=ellipse, style=filled, fillcolor="#b2f2bb"]; Start -> FormatPrompt; FormatPrompt -> LLM; LLM -> ParseResponse; ParseResponse -> IsFinal; IsFinal -> End [label=" Yes"]; IsFinal -> ExecuteTool [label=" No (Action)"]; ExecuteTool -> FormatObs; FormatObs -> UpdateScratchpad; UpdateScratchpad -> FormatPrompt [label="Next Iteration"]; }The ReAct agent iteratively builds context in its scratchpad by formatting prompts, calling the LLM, parsing the response, potentially executing a tool based on the action, formatting the result as an observation, and looping until a final answer is reached.Simplified Python Implementation SketchLet's outline the core loop structure in Python pseudo-code. This focuses on the flow rather than specific API or parsing details.import re # For basic parsing example # Assume llm_call(prompt) exists and returns LLM text response # Assume tools = { "tool_name": {"description": "...", "function": callable} } exists def execute_react_agent(question, tools, llm_call, max_steps=10): """ Executes the ReAct agent loop """ scratchpad = "" # Stores the thought-action-observation history tool_descriptions = "\n".join([f"- {name}: {details['description']}" for name, details in tools.items()]) for step in range(max_steps): # 1. Format Prompt prompt = f"""You are an assistant that uses the ReAct framework to answer questions. Available tools: {tool_descriptions} Use the following format: Thought: Your reasoning steps. Action: The action to take, should be one of [{', '.join(tools.keys())}] or 'Final Answer'. Use Action: tool_name[input] format for tools. Observation: The result of the action. ... (this Thought/Action/Observation cycle repeats) Question: {question} {scratchpad}Thought:""" # Prompt the LLM to start with a thought # 2. LLM Call response = llm_call(prompt).strip() # Append the LLM's thought process to the scratchpad immediately scratchpad += f"Thought: {response}\n" print(f"--- Step {step+1} ---") print(f"Thought: {response}") # 3. Parse Response (Simplified Example) action_match = re.search(r"Action: (.*?)(?:\[(.*?)\])?$", response, re.MULTILINE) final_answer_match = re.search(r"Final Answer: (.*)", response, re.MULTILINE | re.DOTALL) if final_answer_match: # 4a. Final Answer Detected final_answer = final_answer_match.group(1).strip() print(f"Final Answer: {final_answer}") return final_answer if action_match: action_name = action_match.group(1).strip() action_input = action_match.group(2).strip() if action_match.group(2) else "" scratchpad += f"Action: {action_name}[{action_input}]\n" print(f"Action: {action_name}[{action_input}]") if action_name in tools: # 5. Execute Tool try: tool_function = tools[action_name]["function"] observation = tool_function(action_input) except Exception as e: observation = f"Error executing tool {action_name}: {e}" # 6. Format Observation & Update Scratchpad observation_str = str(observation) # Ensure it's a string scratchpad += f"Observation: {observation_str}\n" print(f"Observation: {observation_str}") else: scratchpad += "Observation: Unknown tool specified.\n" print("Observation: Unknown tool specified.") else: # Handle cases where the LLM didn't output a valid Action or Final Answer scratchpad += "Observation: Invalid response format. Stopping.\n" print("Observation: Invalid response format. Stopping.") return "Agent failed due to invalid response format." return "Agent stopped after reaching maximum steps." # Example usage (requires defining llm_call and tools) # result = execute_react_agent("What is 2 + 2?", my_tools, my_llm_call) # print(f"\nFinal Result: {result}")Implementation ApproachesPrompt Engineering: The structure and wording of your prompt template are significant. Clear instructions on the required format (Thought, Action, Observation), concise tool descriptions, and effective examples (if using few-shot prompting) drastically influence performance. Iterative refinement is often necessary.Parsing Reliability: LLM outputs can be inconsistent. Relying solely on simple string splitting or regex might be brittle. Consider asking the LLM to output structured data (like JSON) if feasible, or implement more sophisticated parsing logic with error correction.Tool Input/Output: Ensure tool inputs expected by your functions match what the LLM is likely to generate. Similarly, format tool outputs clearly for the LLM to understand in the Observation step. Handling errors gracefully within tools and reporting them in the observation is important.Context Management: As the scratchpad grows, it consumes context window space and increases API costs. Implement strategies like summarizing earlier parts of the scratchpad or using more advanced memory techniques (covered in Chapter 3) for long-running tasks.Stopping Conditions: After Final Answer: detection and max steps, consider other stopping criteria, such as repeated actions, specific error patterns, or confidence scores if your parser or LLM provides them.Building this custom agent, even with simplified components, illuminates the fundamental challenges and design choices inherent in creating autonomous systems. It provides a solid foundation before utilizing or extending more complex agent frameworks.