Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - Standard textbook covering fundamental concepts and algorithms in reinforcement learning, including policy gradient methods and Q-learning.
Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - Introduces Deep Q-Networks (DQN), demonstrating how to combine Q-learning with deep neural networks using experience replay and a target network.
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov, 2017arXiv preprint arXiv:1707.06347DOI: 10.48550/arXiv.1707.06347 - Presents PPO, an algorithm that improves the stability and performance of policy gradient methods by constraining policy updates.