Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - This paper introduces Deep Q-Networks (DQN), the single-agent algorithm that forms the basis for Independent Deep Q-Learning (IDQN) when applied to high-dimensional observation spaces.
Continuous control with deep reinforcement learning, Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1509.02971 - This paper presents Deep Deterministic Policy Gradient (DDPG), a model-free, off-policy algorithm for continuous action spaces, which is the underlying method for Independent DDPG (IDDPG).