Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine, 2018Proceedings of the 35th International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1801.01290 - Introduces the Soft Actor-Critic (SAC) algorithm, proposing an off-policy actor-critic approach with maximum entropy regularization for continuous control.
Reinforcement Learning with Deep Energy-Based Policies, Tuomas Haarnoja, Yan Duan, Aurick Zhou, Marvin Ren, Pieter Abbeel, Sergey Levine, 2017Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Vol. Volume 70 (PMLR)DOI: 10.48550/arXiv.1706.01518 - Presents the theoretical foundation for maximum entropy reinforcement learning, which SAC builds upon.
Spinning Up in Deep RL: Soft Actor-Critic, Josh Achiam, 2018 (OpenAI) - Provides a clear explanation and practical guide to the SAC algorithm, including pseudocode and implementation details.