Behavior Regularized Actor Critic, Yifan Wu, Guanjun Liu, Jian Peng, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1911.00240 - Proposes Behavior Regularized Actor-Critic (BRAC), a framework for offline RL that regularizes policy updates using an explicit behavior policy estimate.
Offline Reinforcement Learning: A Review, Irina Kostrikov, Ashish Kumar, Sergey Levine, 2021Foundations and Trends in Machine Learning, Vol. 14 (Now Publishers Inc.)DOI: 10.1561/2200000094 - A comprehensive review of offline reinforcement learning, including a detailed discussion of policy constraint methods and related challenges.