Asynchronous Methods for Deep Reinforcement Learning, Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016Proceedings of The 33rd International Conference on Machine Learning, Vol. 48 (PMLR) - 一篇里程碑式的论文,展示了通过Actor-Critic架构成功进行深度强化学习。该方法利用价值函数作为基线,通过异步更新来稳定训练并减少方差。