Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A comprehensive textbook on the theoretical foundations and algorithms of both value-based and policy-based reinforcement learning methods, including their detailed comparison.
Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amazed van den Heuvel, Demis Hassabis, and Daan Wierstra, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - This paper presents Deep Q-Networks (DQN), illustrating how value-based methods are extended with deep learning, and discusses challenges like instability and solutions like experience replay.
Actor-Critic Algorithms, Vijay R. Konda, John N. Tsitsiklis, 2000Advances in Neural Information Processing Systems, Vol. 12 (The MIT Press) - A foundational paper on Actor-Critic architectures, which combine elements of both policy-based and value-based methods to enhance learning stability and efficiency.