Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - 介绍深度Q网络 (DQN) 的基础论文,强调了经验回放和目标网络等关键稳定技术,对于理解和调试训练不稳定性至关重要。