ReAct: Synergizing Reasoning and Acting in Language Models, Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao, 2023arXiv preprint arXiv:2210.03629DOI: 10.48550/arXiv.2210.03629 - Describes the ReAct framework, a common architecture for LLM agents, whose execution flow and internal states are the subject of debugging strategies in the section.
Langfuse Documentation: Observability for LLM Applications, Langfuse Team, 2024 - Provides practical guidance and tools for logging, tracing, and visualizing LLM agent executions, directly supporting the "Comprehensive Logging and Tracing" and "Debugging Interfaces" sections.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou, 2022arXiv preprint arXiv:2201.11903DOI: 10.48550/arXiv.2201.11903 - Introduces Chain-of-Thought prompting, a technique that makes LLM reasoning steps explicit, which is fundamental for analyzing and debugging "Reasoning/Planning Errors" in agent behavior.
AgentBench: Evaluating LLMs as Agents, Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, Jie Tang, 2023arXiv preprint arXiv:2308.03688DOI: 10.48550/arXiv.2308.03688 - Provides a comprehensive benchmark for evaluating LLMs as agents across various tasks, offering insights into common failure modes and challenging scenarios that inform debugging strategies.