Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, 2015Nature, Vol. 518 (Springer Nature)DOI: 10.1038/nature14236 - 这篇论文介绍了深度Q网络(DQN),首次提出使用独立目标网络来稳定训练。
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (The MIT Press) - 一本全面的教材,涵盖了强化学习的理论基础,包括Q学习、时序差分(TD)学习和函数逼近,为DQN的出现奠定了基础。