Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017arXiv preprint arXiv:1707.06347DOI: 10.48550/arXiv.1707.06347 - The original paper introducing the Proximal Policy Optimization (PPO) algorithm, detailing its clipped surrogate objective and characteristics for stable policy optimization.
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (The MIT Press) - A classic textbook offering a comprehensive introduction to reinforcement learning concepts, covering policy gradient methods and the theoretical foundations for algorithms like PPO.