Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - 本文介绍了深度Q网络(DQN),该单智能体算法是应用于高维观测空间时独立深度Q学习(IDQN)的基础。
Continuous control with deep reinforcement learning, Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1509.02971 - 本文介绍了深度确定性策略梯度(DDPG),这是一种用于连续动作空间的无模型、离策略算法,它是独立DDPG(IDDPG)的底层方法。