Continuous Control with Deep Reinforcement Learning, Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1509.02971 - 提出了深度确定性策略梯度(DDPG)算法,该算法推广了软目标更新(Polyak平均)的使用,该方法在文本中被提及。