Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Comprehensive textbook explaining Q-learning, temporal difference learning, and the maximization bias problem, along with Double Q-learning as a solution.
Double Q-learning, Hado van Hasselt, 2010Advances in Neural Information Processing Systems (NIPS), Vol. 23 (Neural Information Processing Systems Foundation) - Introduces Double Q-learning to mitigate maximization bias in reinforcement learning, providing theoretical analysis.
Deep Reinforcement Learning with Double Q-learning, Hado van Hasselt, Arthur Guez, and David Silver, 2016Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI) - Extends Double Q-learning to Deep Q-Networks (DQN), demonstrating its effectiveness in reducing overestimation and improving performance.