Parameter Space Noise for Exploration, Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz, 2017International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1706.01905 - 介绍参数空间噪声作为探索策略的开创性论文,并详细阐述了一种自适应噪声缩放机制。
Continuous Control with Deep Reinforcement Learning, Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1509.02971 - 介绍了深度确定性策略梯度 (DDPG) 算法,这是参数空间噪声在探索方面提供显著优势的常见应用场景。