Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - This textbook is the definitive introduction to reinforcement learning, with comprehensive coverage of policy gradient methods, including REINFORCE, and detailed explanations of Actor-Critic algorithms and their advantages.
Asynchronous Methods for Deep Reinforcement Learning, Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016ICML 2016DOI: 10.48550/arXiv.1602.01783 - This paper introduced A3C, a highly influential deep reinforcement learning algorithm that uses asynchronous parallel agents to efficiently train Actor-Critic models, demonstrating a practical and scalable approach.
Actor-Critic Algorithms, Vijay R. Konda, John N. Tsitsiklis, 1999Advances in Neural Information Processing Systems, Vol. 12 (MIT Press) - A foundational paper that provides an early theoretical treatment of Actor-Critic algorithms, laying the groundwork for subsequent developments by formalizing their structure and convergence properties.
Reinforcement Learning Lecture 6: Policy Gradient Methods, David Silver, 2015 (University College London (UCL)) - This lecture from a renowned reinforcement learning course provides an excellent conceptual overview of policy gradient methods and introduces Actor-Critic as a key technique for variance reduction and stable learning.