Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Andras G. P. Szepesvari, Helen King, Raia Hadsell and Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - 介绍了基础的深度Q网络 (DQN) 算法,它通过深度神经网络和经验回放将Q学习应用于高维状态空间。
Double Q-learning, Hado van Hasselt, 2010Advances in Neural Information Processing Systems 23 (NIPS 2010), Vol. 23 (Curran Associates, Inc.) - 介绍了双Q学习算法,通过使用独立的价值估计进行动作选择和评估,减少了过高估计偏差。