Residual algorithms for reinforcement learning, Leemon C. Baird, 1995Proceedings of the Twelfth International Conference on Machine Learning (ICML) (Elsevier Science & Technology Books)DOI: 10.1145/224483.224489 - 这是一篇早期的基础论文,强调了带有函数近似的离策略时序差分学习(特别是Q-learning)的不稳定性和发散问题,并提出了“残差算法”来解决这些问题。
Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - 这篇里程碑式的论文介绍了深度Q网络(DQN),它通过使用经验回放和目标网络,成功地将深度神经网络与Q-learning结合,直接解决了“致命三联征”带来的不稳定性挑战。