Off-Policy Deep Reinforcement Learning without Catastrophic Forgetting, Scott Fujimoto, David Meger, Doina Precup, 2019International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1812.02900 - Presents Batch-Constrained deep Q-learning (BCQ), an influential early algorithm that addresses distributional shift by explicitly constraining the learned policy to stay close to the behavior policy.
Reinforcement Learning: An Introduction, Richard S. Sutton, Andrew G. Barto, 2018 (MIT Press) - A definitive textbook on reinforcement learning that provides foundational knowledge of Q-learning, off-policy methods, and the underlying principles that make online interaction valuable, providing essential context for understanding offline RL challenges.