Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (The MIT Press) - This authoritative textbook provides a comprehensive explanation of the deadly triad, detailing the convergence and divergence issues arising from the combination of function approximation, bootstrapping, and off-policy learning.
Residual algorithms for reinforcement learning, Leemon C. Baird, 1995Proceedings of the Twelfth International Conference on Machine Learning (ICML) (Elsevier Science & Technology Books)DOI: 10.1145/224483.224489 - An early foundational paper that highlights the instability and divergence problems of off-policy temporal difference learning with function approximation, specifically Q-learning, and proposes 'residual algorithms' to address these issues.
Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - This landmark paper introduces Deep Q-Networks (DQN), a method that successfully combines deep neural networks with Q-learning by employing experience replay and target networks, directly overcoming the instability challenges posed by the deadly triad.