Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - This book is a standard reference for reinforcement learning, providing a thorough explanation of policy gradient methods, including the Policy Gradient Theorem and the REINFORCE algorithm.
Lecture 6: Policy Gradient, David Silver, 2015UCL Course on Reinforcement Learning (University College London) - David Silver's lectures are highly influential. Lecture 6 specifically covers policy gradient methods, the Policy Gradient Theorem, and REINFORCE, often with a modern perspective relevant to deep reinforcement learning.
Asynchronous Methods for Deep Reinforcement Learning, Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016Proceedings of The 33rd International Conference on Machine Learning, Vol. 48 (PMLR)DOI: 10.48550/arXiv.1602.01783 - While this paper introduces A3C, a key part is its discussion of advantage functions for variance reduction in policy gradients, which is directly relevant to the mentioned disadvantage of REINFORCE.