Asynchronous Methods for Deep Reinforcement Learning, Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1602.01783 - 介绍了A3C算法,其特点是异步更新、共享网络架构和熵正则化,适用于深度Actor-Critic智能体。