Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - 这篇原始论文介绍了深度Q网络(DQN),并详细说明了经验回放机制及其在稳定训练中的作用。
Prioritized Experience Replay, Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver, 2016International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1511.05952 - 本文介绍了优先经验回放,这是一种通过优先处理具有较高时序差分误差的经验来改进均匀采样的高级采样策略。