This hands-on exercise guides you through analyzing and improving a multi-agent LLM system, the "Automated Research Digest System" (ARDS). The exercise focuses on diagnosing issues, implementing optimizations, and evaluating the results of such systems. ARDS is designed to take a general research topic, break it down, search for relevant academic information, and compile a structured digest.
Our ARDS consists of the following agents:
QueryPlannerAgent: Receives a broad research topic and decomposes it into several specific, answerable questions.ParallelSearchAgent: Manages a pool of SearchSubAgent instances. It distributes the specific questions from the QueryPlannerAgent to these sub-agents.SearchSubAgent (multiple instances): Each instance takes a single question, formulates a query, and interacts with an external (mock) AcademicSearchAPI to retrieve information.ContentSynthesizerAgent: Gathers all search results from the various SearchSubAgents, synthesizes the information, and prepares coherent answers for each initial question.ReportGeneratorAgent: Takes the synthesized content and formats it into a final, structured research digest.The intended workflow is as follows:
The Automated Research Digest System (ARDS) workflow, from user input to final report.
Despite its design, users have reported issues: ARDS is often slow, reports can be incomplete, and operational costs (primarily LLM token usage) are higher than anticipated.
Before making changes, we establish a baseline. Assume we've run ARDS on a set of test topics and collected the following initial metrics:
| Metric | Before Optimization | Target |
|---|---|---|
| Avg. Task Completion Time | 125 seconds | < 60 sec |
| API Call Success Rate | 85% | > 99% |
| Avg. API Latency (per call) | 8 seconds | < 4 sec |
| Cost per Report (Tokens) | 150,000 | < 100,000 |
| Report Completeness Score | 3/5 (avg) | 5/5 |
A review of system logs reveals patterns like these:
[2023-11-10 14:30:10] [QueryPlannerAgent] INFO: Topic "AI in Healthcare Diagnostics" decomposed into 5 questions.
[2023-11-10 14:30:11] [ParallelSearchAgent] INFO: Dispatched 5 search tasks to sub-agents.
[2023-11-10 14:30:15] [SearchSubAgent-3] INFO: Querying AcademicSearchAPI for "Ethical concerns of AI in diagnostics".
[2023-11-10 14:30:28] [SearchSubAgent-3] INFO: Received 7 results from API. Latency: 13250ms.
[2023-11-10 14:30:30] [SearchSubAgent-1] INFO: Querying AcademicSearchAPI for "Current AI algorithms for cancer detection".
[2023-11-10 14:30:45] [SearchSubAgent-1] ERROR: API call failed for "Current AI algorithms for cancer detection". Error: Connection Timeout. Attempt 1/1.
[2023-11-10 14:30:46] [SearchSubAgent-2] WARN: AcademicSearchAPI rate limit likely hit. Delaying next request.
...
[2023-11-10 14:32:05] [ContentSynthesizerAgent] WARN: Received results for only 3 out of 5 planned questions. Proceeding with available data.
[2023-11-10 14:32:15] [ContentSynthesizerAgent] INFO: Synthesis complete. Total input tokens: 95000. Output tokens: 8000.
From these logs and metrics, we can infer:
SearchSubAgent calls to AcademicSearchAPI are slow (e.g., 13250ms latency) and prone to failure (timeouts, rate limits). This directly impacts overall task completion time and report completeness.SearchSubAgent-1 error indicates a single attempt. Failures are not retried, leading to lost data for the ContentSynthesizerAgent.ContentSynthesizerAgent processes a large number of input tokens (95,000), suggesting that raw, verbose search results are being passed, contributing to high costs.Based on our initial analysis, we can pinpoint specific areas for improvement:
Bottleneck 1: AcademicSearchAPI Interaction Efficiency and Reliability
Failure Point 1: Lack of Resilience in ParallelSearchAgent and SearchSubAgents
SearchSubAgent fails to retrieve data (e.g., due to an API error), that piece of information is simply missing from the final report. The system doesn't have mechanisms to retry effectively or to signal the problem upstream for alternative actions.Cost Driver 1: Inefficient Data Handling and Token Usage in ContentSynthesizerAgent
ContentSynthesizerAgent indicates that it's likely receiving extensive, unfiltered text from the search results. This makes the LLM's job harder, increases processing time, and significantly drives up token costs.Let's devise strategies to address these issues.
Improvement A: Enhancing SearchSubAgent and AcademicSearchAPI Interaction
SearchSubAgent to retry failed API calls using an exponential backoff strategy. For example, wait 2s, then 4s, then 8s between retries, up to a maximum of 3-5 attempts. This helps overcome transient network issues or temporary rate limits.
# Pseudocode for SearchSubAgent API call
# async def call_academic_api(query, max_retries=3):
# delay = 2
# for attempt in range(max_retries):
# try:
# response = await actual_api_call(query)
# return response
# except APITimeoutError as e:
# log.warn(f"API Timeout (Attempt {attempt+1}/{max_retries}). Retrying in {delay}s...")
# await asyncio.sleep(delay)
# delay *= 2
# except APIRateLimitError as e:
# log.warn(f"API Rate Limit (Attempt {attempt+1}/{max_retries}). Retrying in {delay*2}s...") # Longer delay for rate limits
# await asyncio.sleep(delay*2)
# delay *= 2
# log.error(f"API call failed after {max_retries} attempts for query: {query}")
# return None
AcademicSearchAPI responses. If the same specific question (or a very similar one) is asked again within a short timeframe, serve the cached result to reduce latency and API load.SearchSubAgent to formulate queries. Instruct the agent to make queries more specific or to request summaries from the API if the API supports such features, thereby reducing the volume of data transferred and processed.Improvement B: Improving ParallelSearchAgent Resilience
ParallelSearchAgent should actively track the success or failure of each SearchSubAgent task.SearchSubAgent definitively fails after all retries, the ParallelSearchAgent could:
ContentSynthesizerAgent about the missing piece of information, so it can acknowledge the gap in the report.SearchSubAgent with a slightly broader or rephrased query for the missing item.Improvement C: Optimizing ContentSynthesizerAgent for Cost and Efficiency
SearchSubAgents: Modify SearchSubAgents. After retrieving results from the API, each SearchSubAgent uses an LLM call to perform an initial, concise summarization or extract important facts relevant to its specific question before sending data to ContentSynthesizerAgent. This significantly reduces the input token load on the ContentSynthesizerAgent.
SearchSubAgent's pre-summary step: "You are an expert research assistant. Given the following search results for the question '{original_question}', extract the 3-5 most important facts or provide a concise summary of no more than 150 words. Focus only on information directly answering the question."ContentSynthesizerAgent: Adjust the ContentSynthesizerAgent's prompt to expect pre-processed, summarized inputs. Its task becomes integrating these focused summaries into a coherent narrative.
ContentSynthesizerAgent: "You are an expert report writer. You have received several pieces of summarized information, each answering a specific sub-question of the main research topic '{main_topic}'. Your task is to synthesize these summaries into a single, coherent section for that sub-question. Ensure smooth transitions and logical flow. Here is the summarized information for sub-question '{sub_question_text}': {list_of_summaries_for_sub_question}"Improvement D: Dynamic Agent Scaling (Brief Mention)
For systems handling variable loads, the ParallelSearchAgent could be designed to dynamically adjust the number of concurrent SearchSubAgent instances based on the number of questions generated by QueryPlannerAgent or real-time feedback on AcademicSearchAPI responsiveness. This prevents overwhelming the API and optimizes resource use.
After implementing these changes, we rerun our tests. The new logs might look like this:
[2023-11-10 15:00:12] [QueryPlannerAgent] INFO: Topic "AI in Healthcare Diagnostics" decomposed into 5 questions.
[2023-11-10 15:00:13] [ParallelSearchAgent] INFO: Dispatched 5 search tasks to sub-agents.
[2023-11-10 15:00:16] [SearchSubAgent-3] INFO: Querying AcademicSearchAPI for "Ethical concerns of AI in diagnostics". (Cache MISS)
[2023-11-10 15:00:22] [SearchSubAgent-3] INFO: Received 7 results. Latency: 6150ms. Performing pre-summary.
[2023-11-10 15:00:24] [SearchSubAgent-3] INFO: Pre-summary complete. Tokens: 800 -> 120.
[2023-11-10 15:00:25] [SearchSubAgent-1] INFO: Querying AcademicSearchAPI for "Current AI algorithms for cancer detection". (Cache MISS)
[2023-11-10 15:00:30] [SearchSubAgent-1] WARN: API Timeout (Attempt 1/3). Retrying in 2s...
[2023-11-10 15:00:33] [SearchSubAgent-1] INFO: Received 5 results. Latency: 2800ms (after retry). Performing pre-summary.
[2023-11-10 15:00:35] [SearchSubAgent-1] INFO: Pre-summary complete. Tokens: 650 -> 100.
...
[2023-11-10 15:01:05] [ContentSynthesizerAgent] INFO: Received pre-summarized results for all 5 planned questions.
[2023-11-10 15:01:10] [ContentSynthesizerAgent] INFO: Synthesis complete. Total input tokens: 2500 (sum of pre-summaries). Output tokens: 7500.
The updated metrics table reflects these improvements:
| Metric | Before Optimization | After Optimization | Target |
|---|---|---|---|
| Avg. Task Completion Time | 125 seconds | 45 seconds | < 60 sec |
| API Call Success Rate | 85% | 99.5% | > 99% |
| Avg. API Latency (per call) | 8 seconds | 2.5 seconds | < 4 sec |
| Cost per Report (Tokens) | 150,000 | 45,000 | < 100,000 |
| Report Completeness Score | 3/5 (avg) | 4.9/5 (avg) | 5/5 |
Average task completion time for ARDS before and after optimization efforts.
The optimizations have led to significant gains:
SearchSubAgents drastically cuts down the token input to the ContentSynthesizerAgent, leading to substantial cost savings.This iterative process of analysis, diagnosis, implementation, and re-evaluation is fundamental to maintaining and enhancing multi-agent LLM systems.
While our initial improvements are substantial, further refinements are always possible. Consider these advanced points for continuous improvement of a system like ARDS:
Granular Observability: Implement structured logging and distributed tracing (using libraries like OpenTelemetry and platforms such as LangSmith or Langfuse). This allows you to visualize the entire flow of a request across all agents, inspect individual LLM prompts and responses, track token counts per agent, and pinpoint precise latencies for each step. This detailed view is invaluable for debugging subtle issues and identifying further optimization opportunities.
Human-in-the-Loop (HITL) Integration: For particularly complex research topics where ARDS might struggle with details or generating high-quality query plans, consider adding HITL checkpoints.
QueryPlannerAgent generates its questions, a human expert could briefly review and approve/edit them before they are passed to ParallelSearchAgent. This can prevent wasted effort on poorly formulated questions.ReportGeneratorAgent for quality assurance, especially for critical applications. The system could even flag reports where ContentSynthesizerAgent noted low confidence or missing data.A/B Testing Agent Configurations: Experiment with different configurations for main agents. For example:
ContentSynthesizerAgent: one using a standard summarization prompt and another using a prompt that encourages more critical analysis or comparison of information.SearchSubAgents versus a more powerful model for the final ContentSynthesizerAgent).Security for External API Interactions:
AcademicSearchAPI required an API key, ensure it's stored securely (e.g., in a secrets manager) and accessed by SearchSubAgents without being exposed in logs or code.Fine-grained Cost Attribution and Management:
By systematically applying evaluation, debugging, and tuning techniques, you can transform a functional multi-agent LLM system into one that is efficient, reliable, and cost-effective. Remember that optimization is often an ongoing process as system requirements evolve and new interaction patterns emerge.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with