Double Q-learning, Hado van Hasselt, 2010Advances in Neural Information Processing Systems (NIPS), Vol. 23 (Neural Information Processing Systems Foundation) - 介绍了双Q学习,以减轻强化学习中的最大化偏差,并提供了理论分析。
Deep Reinforcement Learning with Double Q-learning, Hado van Hasselt, Arthur Guez, and David Silver, 2016Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI) - 将双Q学习扩展到深度Q网络(DQN),证明了其在减少过高估计和提高性能方面的有效性。