Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amazed van den Heuvel, Demis Hassabis, and Daan Wierstra, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - 这篇论文介绍了深度Q网络(DQN),展示了如何将基于价值的方法与深度学习结合,并讨论了诸如不稳定性等挑战及经验回放等解决方案。
Actor-Critic Algorithms, Vijay R. Konda, John N. Tsitsiklis, 2000Advances in Neural Information Processing Systems, Vol. 12 (The MIT Press) - 一篇关于Actor-Critic架构的基础论文,该架构结合了基于策略和基于价值方法,以提高学习稳定性和效率。