Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (The MIT Press) - A foundational book on reinforcement learning, covering Q-learning, value function estimation, and related algorithms.
Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Andras G. P. Szepesvari, Helen King, Raia Hadsell and Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - Presents the foundational Deep Q-Network (DQN) algorithm, which adapts Q-learning for high-dimensional state spaces using deep neural networks and experience replay.
Double Q-learning, Hado van Hasselt, 2010Advances in Neural Information Processing Systems 23 (NIPS 2010), Vol. 23 (Curran Associates, Inc.) - Introduces the Double Q-learning algorithm, which mitigates overestimation bias by using separate value estimates for action selection and evaluation.
Deep Reinforcement Learning with Double Q-learning, Hado van Hasselt, Arthur Guez and David Silver, 2016AAAI Conference on Artificial Intelligence (AAAI)DOI: 10.48550/arXiv.1509.06461 - Proposes Double Deep Q-Networks (DDQN), an extension of Double Q-learning to deep reinforcement learning that specifically addresses the overestimation bias present in DQN. This is the primary reference for the section.