Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (The MIT Press) - A foundational textbook covering both value-based and policy gradient methods, explaining their principles and the scenarios where policy gradient methods offer advantages, such as handling continuous action spaces and learning stochastic policies.
Continuous Control with Deep Reinforcement Learning, Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1509.02971 - Introduces Deep Deterministic Policy Gradient (DDPG), an algorithm specifically designed for continuous action spaces, which directly addresses a significant limitation of traditional value-based methods like DQN.
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov, 2017DOI: 10.48550/arXiv.1707.06347 - Presents Proximal Policy Optimization (PPO), a widely used and robust policy gradient algorithm known for its performance in various continuous and discrete control tasks, showcasing the effectiveness of direct policy optimization.