Now, let's put theory into practice. You've learned about common issues, systematic iteration, and analysis techniques for agent prompts. This hands-on exercise is designed to walk you through a debugging and refinement process for a simplified agentic workflow that isn't performing as expected. We'll identify issues, propose changes to the prompts, and discuss how these changes lead to better outcomes.
Imagine we have an agent, ReportCraftAI
, designed to help create summaries of recent news articles on a specific topic. Its intended workflow is:
However, users have reported that ReportCraftAI
often returns only one article, sometimes irrelevant ones, and the summaries can be poorly extracted.
Our task is to diagnose the problems by examining its prompts and refine them to improve its performance. We'll assume the agent has access to a search_web(query: str, time_period: str, num_results: int)
tool and an extract_summary(article_text: str)
tool.
Let's assume the core system prompt for ReportCraftAI
includes something like this when tasked with a topic, say "advancements in renewable energy":
Initial System Prompt Segment:
You are ReportCraftAI, a helpful assistant.
Your goal is to create a report on recent news.
Current Task: User wants a report on "advancements in renewable energy".
Available Tools:
1. search_web(query: str, time_period: str, num_results: int): Searches the web for news. time_period can be "past_week", "past_month".
2. extract_summary(article_text: str): Extracts a concise summary from an article.
Plan:
1. Search for news on the topic.
2. Extract information.
3. Compile the report.
Begin.
The agent then uses this to generate further prompts for itself or to decide on tool use. A likely first internal prompt or thought process for step 1 (Search) might be influenced by:
Initial Agent "Thought" for Search:
"I need to find news on 'advancements in renewable energy'. I should use the search_web
tool."
Resulting Tool Call Prompt (Implicit or Explicit):
search_web(query="advancements in renewable energy news", time_period="recent", num_results=1)
Observed Problems:
num_results=1
or another low number if not explicitly guided, or the term "recent" might be too vague for the time_period
parameter if the tool expects specific values.extract_summary
is just "Summarize this article," the quality can be inconsistent.If we had logs, we might see something like this (simplified):
Thought: User wants a report on "advancements in renewable energy". I need to find articles.
Action: search_web(query="advancements in renewable energy news", time_period="recent", num_results=1)
Observation: [Article1: "Solar Power Breakthrough...", Content: "Detailed text..."]
Thought: I have one article. Now I need to summarize it.
Action: extract_summary(article_text="Detailed text...")
Observation: "Solar power is advancing."
Thought: I have the summary. I will compile the report.
Output: Report: Article 1: Solar Power Breakthrough. Summary: Solar power is advancing.
This trace clearly shows the agent only fetching one article and producing a very brief summary.
The initial system prompt is too high-level and doesn't provide enough constraints for the search task. We need to be more specific.
Revised System Prompt Segment (Focus on Task Definition):
You are ReportCraftAI, a helpful assistant.
Your goal is to create a report on recent news.
Current Task: User wants a report on "advancements in renewable energy".
Specific Instructions:
- Find exactly three (3) relevant news articles.
- Articles must be published within the "past_week".
- Focus on significant developments or announcements.
Available Tools:
1. search_web(query: str, time_period: str, num_results: int): Searches the web for news. time_period must be "past_week" or "past_month".
2. extract_summary(article_text: str, desired_length_words: int): Extracts a concise summary from an article to a desired word length.
Plan:
1. Formulate a precise search query based on the topic and instructions.
2. Use search_web to find 3 articles from the past_week.
3. For each article, use extract_summary to get a 50-word summary.
4. Compile the headlines and summaries into a report.
Begin.
Reasoning for Changes:
num_results
and time_period
parameters for the search_web
tool.extract_summary
more controllable by adding desired_length_words
.With these changes, the agent's internal "thought" process for the search becomes more constrained, leading to a better tool call:
Improved Agent "Thought" for Search:
"I need to find 3 recent articles (past_week) on 'significant advancements in renewable energy'. I will use search_web
."
Resulting Tool Call Prompt (Implicit or Explicit):
search_web(query="significant advancements in renewable energy", time_period="past_week", num_results=3)
This is a significant improvement and directly addresses Problem 1 and helps with Problem 2.
Previously, the extract_summary
tool might have been called with minimal instruction. The revised system prompt now guides the agent to use the new desired_length_words
parameter.
Original Implicit Prompt to extract_summary
(derived from "Extract information"):
"Summarize this article text: [article content]"
Revised Prompt for extract_summary
(derived from "use extract_summary to get a 50-word summary"):
"Extract a summary of approximately 50 words from the following text, focusing on the main findings: [article content]"
Or, if the agent directly calls the tool based on the plan:
extract_summary(article_text="[article content]", desired_length_words=50)
Reasoning for Changes:
This directly addresses Problem 3, leading to more consistent and useful summaries.
Let's say after these changes, the agent now reliably gets three articles and the summaries are better, but sometimes one of the articles is an opinion piece rather than a news report. The initial instruction "Focus on significant developments or announcements" was a good start, but we can refine the prompt for query generation further.
We could try a variation in the system prompt's instructions for search:
System Prompt Variation (Search Instruction): "...Focus on factual news reports about significant developments or announcements, avoiding opinion pieces or blog posts."
Comparing Variations: To test this, you would run the agent with both the previous prompt and this new variation on several different topics. You'd then compare the outputs:
This is where techniques like A/B testing prompts become useful. If this variation consistently reduces irrelevant articles without harming other aspects, it's a good candidate for adoption.
A simple diagram can illustrate the shift in the agent's process:
This diagram shows the contrast between the initial, less effective workflow and the refined workflow achieved by improving the agent's guiding prompts.
Throughout this debugging process, detailed logging would be invaluable. Imagine logs capturing:
For example, if the search_web
tool returned an error message or unexpected data, logs would help pinpoint if the issue was the tool itself or how the agent prompted it. If extract_summary
consistently produced summaries that were too short despite the 50-word request, logs would help investigate if the tool respected the parameter or if the input text was too brief.
By reviewing these logs, you can systematically identify which parts of your prompt chain are weak and require further refinement. Organizing your prompts with version control (e.g., using Git for prompt files or a dedicated prompt management system) allows you to track changes, revert if a new prompt performs worse, and manage different versions for A/B testing.
This hands-on exercise simulated a common scenario in developing agentic workflows. The key is not just to write prompts, but to treat them as a core part of your system that requires testing, analysis, and iterative refinement. By applying the principles from this chapter, you can significantly enhance the reliability and performance of your AI agents.
Was this section helpful?
© 2025 ApX Machine Learning