Once an AI agent has formulated an initial plan, the task is far from over. A significant next step is determining whether that plan is sound, achievable, and truly aligned with the intended goal. This section explores how carefully constructed prompts can empower an agent to perform this self-assessment, scrutinizing its own plans for viability and quality. Without this internal review, agents might embark on flawed or inefficient paths, wasting resources and failing to deliver desired outcomes.
Before an agent can evaluate a plan, it needs criteria for judgment. We can broadly categorize these into viability and quality.
A viable plan is one the agent can realistically execute. This means:
Plan quality, on the other hand, refers to how well the plan achieves the goal. Aspects of quality include:
Prompt engineering allows us to guide the agent in considering these factors systematically.
Several prompting techniques can be employed to encourage an agent to evaluate its generated plans:
Self-Critique Prompts One of the most direct ways to have an agent evaluate its plan is to instruct it to critique its own work. This involves providing the generated plan back to the agent (or keeping it in context) and asking specific questions about its strengths and weaknesses.
For example, after an agent generates a plan:
[Generated Plan]
...
Step 1: Action A
Step 2: Action B
Step 3: Action C
...
Now, critically review this plan:
1. Is each step clearly defined and actionable with your current capabilities?
2. Are there any ambiguous instructions or dependencies between steps that are not addressed?
3. Does the plan directly lead to achieving the primary goal: '[Original Goal]'?
4. Identify up to 3 potential weaknesses or areas for improvement in this plan.
You can also provide a checklist or a rubric directly in the prompt, guiding the agent's attention to specific attributes you want it to assess.
Constraint Checking Prompts If the agent's operation is bound by specific rules or limitations, these must be re-verified against the generated plan. Prompts can explicitly list these constraints and ask the agent to confirm compliance.
Consider an agent planning a social media campaign with a budget limit:
Original Goal: Plan a 7-day social media campaign for product X with a maximum budget of $500.
[Generated Plan]
...
Day 1: Launch ad A (Estimated cost: $100)
Day 2: Boost post B (Estimated cost: $50)
...
Day 7: Run contest C (Estimated cost: $150)
Total Estimated Cost: $550
Review the plan above. The maximum budget is $500.
1. Does the plan adhere to this budget constraint?
2. If not, identify the step(s) causing the overage and suggest modifications to bring the plan within budget.
Resource Assessment Prompts A plan might look good on paper but fail if necessary resources aren't available. Prompts can ask the agent to inventory required resources against available ones for each step.
Example for an agent using external APIs:
[Generated Plan]
Step 1: Retrieve user data using UserProfileAPI.
Step 2: Analyze sentiment using SentimentAnalysisAPI.
Step 3: Summarize findings and store in Database.
For the plan above:
1. List all external tools or APIs required for each step.
2. Confirm that you have valid credentials and sufficient rate limits for each identified API.
3. If any resource is unavailable or insufficient, note it as a critical issue.
Risk Identification Prompts Proactive risk assessment can prevent failures. Prompts can guide agents to think about what might go wrong and how to handle it. This touches upon creating plans that are more resilient to issues.
Example:
[Generated Plan]
...
Consider the potential risks associated with this plan:
1. For each step, identify one or two potential failure modes (e.g., API timeout, unexpected data format, tool malfunction).
2. For each identified risk, suggest a brief contingency action or an error handling strategy.
3. Are there any steps that are particularly high-risk?
Efficiency and Optimality Prompts While not always the primary concern, efficiency can be important. You can prompt an agent to look for ways to streamline its plan.
Example:
[Generated Plan]
...
Evaluate this plan for efficiency:
1. Are there any redundant steps that can be removed without affecting the outcome?
2. Could any steps be reordered or parallelized to achieve the goal faster?
3. Suggest a more efficient alternative if one exists.
Completeness Checks It's easy for an LLM to miss a sub-requirement of a complex goal. A completeness check prompt ensures all aspects are addressed.
Example:
User Request: 'Book a round-trip flight from New York to London, find a pet-friendly hotel near Hyde Park for 3 nights, and arrange airport transfers for two people.'
[Generated Plan]
- Step 1: Search flights NYC-LON.
- Step 2: Book selected flight.
- Step 3: Search hotels near Hyde Park.
- Step 4: Book selected hotel.
Review the plan against the original user request:
1. Does the plan address all parts of the user's request (flight, hotel, transfers, pet-friendly, number of people)?
2. Identify any missing components or requirements not covered by the plan.
Evaluating a plan isn't usually a single, final step. More often, it's part of an iterative loop: the agent generates a plan, evaluates it, and then uses the evaluation feedback to refine the plan. This cycle can repeat until the plan meets the desired standards of viability and quality.
The output from an evaluation prompt (e.g., identified flaws, unmet constraints, or suggested improvements) can be directly incorporated into a subsequent prompt that asks the agent to revise its previous plan.
For instance, if an evaluation reveals a budget overrun:
Previous Plan (Cost: $550):
...
Evaluation Feedback: Plan exceeds budget of $500. Over by $50.
Revise the previous plan to meet the budget constraint of $500. Incorporate the evaluation feedback. Explain your changes.
This iterative process is fundamental to building more reliable and intelligent agents. We can visualize this loop as follows:
A diagram showing the iterative cycle of plan generation, evaluation, and refinement in an agentic system.
Beyond qualitative feedback, it's often useful to prompt the agent to provide a quantitative or structured assessment of its plan. This can help the overall system decide whether to proceed with the plan, request further revisions, or perhaps abandon the current approach if the plan quality is consistently low.
You can ask for:
These structured outputs can be parsed by the controlling logic of the agentic workflow to make automated decisions. For example, a plan with a confidence score below 7 might automatically trigger a re-planning cycle with more explicit guidance.
Let's put this into a more concrete scenario. Suppose an agent is tasked with creating a travel itinerary. After generating an initial draft, a well-designed evaluation prompt can help catch errors or omissions.
Agent's Initial Task: "Plan a 3-day weekend trip to San Francisco for two adults for next month. Include return flights from LAX, a hotel near Fisherman's Wharf with good reviews (4 stars+), and at least one major tourist attraction per day. Budget is $1500."
Agent's (Potentially Flawed) Initial Plan Output:
Evaluation Prompt:
Here is the proposed travel plan:
[Insert Agent's Initial Plan Output here]
Please evaluate this plan against the original request and general travel planning best practices. Consider the following:
1. **Goal Alignment**: Does the plan fully address all components of the original request (destination, duration, number of people, flight origin, hotel location preference, hotel star rating, daily attractions, budget)? List any deviations or omissions.
2. **Viability & Realism**:
* Are the flight and hotel prices realistic for next month? (You can state if you need to use a tool to check this).
* Is the hotel's star rating and location as requested?
* Is the budget of $1500 likely to be met, considering flights, hotel, attractions, food, and local transport (estimate if necessary)?
3. **Quality & Completeness**:
* Are return flights included?
* Are specific dates or times for travel/attractions considered or mentioned as needing finalization?
* Is the hotel choice suitable for "two adults" (e.g., not a dorm in a hostel if that's implied)?
4. **Overall Assessment**: Provide a brief summary of the plan's quality (e.g., Excellent, Good, Fair, Poor) and list 2-3 specific suggestions for improvement.
By using such a prompt, the agent is guided to self-identify issues like the wrong hotel star rating, potential budget problems if food/local transport isn't factored in, and the hostel choice for "two adults." This feedback then drives the refinement process.
Equipping AI agents with the ability to critically evaluate their own plans is a significant step towards more autonomous and reliable systems. Through careful prompt engineering, we can guide agents to scrutinize their proposed actions for feasibility, efficiency, completeness, and adherence to constraints. This self-correction mechanism, often employed in an iterative loop, allows agents to refine their strategies, avoid missteps, and ultimately produce higher-quality outcomes. As you move into practical exercises, you will apply these techniques to build agents that not only plan but also reflect on the quality of those plans.
Was this section helpful?
© 2025 ApX Machine Learning