Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A classic and comprehensive textbook on reinforcement learning, providing detailed explanation of policy gradient methods, including REINFORCE, its unbiasedness and high variance, and methods for variance reduction using baselines. Second edition.
Policy gradient methods for reinforcement learning with function approximation, Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour, 1999Advances in Neural Information Processing Systems 12, Vol. 12 (MIT Press) - This influential paper formalizes policy gradient theorems and discusses high variance, demonstrating how subtracting a baseline can reduce variance without introducing bias. It provides theoretical background for the variance reduction techniques mentioned.