Asynchronous Methods for Deep Reinforcement Learning, Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016ICML 2016DOI: 10.48550/arXiv.1602.01783 - Introduces Asynchronous Advantage Actor-Critic (A3C), the asynchronous precursor to A2C, detailing the advantage function, shared network architecture, and parallel learning strategies.
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Provides foundational concepts of Actor-Critic methods, policy gradients, value functions, and the use of baselines for variance reduction. See Chapter 13.
Spinning Up in Deep RL: Actor-Critic, Joshua Achiam, 2018 (OpenAI) - A practical guide explaining the implementation details of Actor-Critic algorithms, including A2C, its architecture, loss functions, and training considerations.