Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A comprehensive textbook providing an in-depth explanation of policy gradient methods, including REINFORCE, its limitations, and the conceptual transition to actor-critic approaches for variance reduction. (2nd edition)
Actor-Critic Algorithms, Vijay R. Konda, John N. Tsitsiklis, 1999Advances in Neural Information Processing Systems, Vol. 12 (The MIT Press) - Provides an early theoretical treatment of actor-critic algorithms, highlighting their potential benefits for reducing variance in policy gradient estimation compared to Monte Carlo methods.