Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Comprehensive textbook on reinforcement learning, detailing Actor-Critic methods, policy gradient theory, and value function approximation.
Asynchronous Methods for Deep Reinforcement Learning, Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1602.01783 - Introduces the A3C algorithm, featuring asynchronous updates, shared network architectures, and entropy regularization for deep Actor-Critic agents.
High-Dimensional Continuous Control Using Generalized Advantage Estimation, John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, 2016International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1506.02438 - Presents Generalized Advantage Estimation (GAE), a method to estimate the advantage term ($A_t$) with reduced variance in policy gradient methods, commonly used in Actor-Critic.
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017arXiv preprintDOI: 10.48550/arXiv.1707.06347 - Describes PPO, a stable and widely adopted policy gradient algorithm that uses an Actor-Critic architecture with a clipped objective.