Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (The MIT Press) - An authoritative textbook detailing Temporal-Difference learning, SARSA, Q-learning, and the distinction between on-policy and off-policy methods, including illustrative examples like Cliff Walking.
Q-learning, Christopher J. C. H. Watkins and Peter Dayan, 1992Machine Learning, Vol. 8DOI: 10.1007/BF00992698 - The original academic paper introducing Q-learning, a foundational off-policy Temporal-Difference control algorithm.
On-line Q-learning using connectionist systems, Gavin A. Rummery and Mahesan Niranjan, 1994CUED/F-INFENG/TR 166, Cambridge University Engineering Department, Technical Report (Cambridge University Engineering Department) - This technical report presents the algorithm that was later explicitly named SARSA, an on-policy Temporal-Difference control method.
Reinforcement Learning (Course Lectures), David Silver, 2015 - A highly regarded lecture series from University College London that provides clear explanations of foundational reinforcement learning algorithms, including SARSA and Q-learning.