Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (A Bradford Book, The MIT Press) - A comprehensive and authoritative textbook that provides a foundational understanding of reinforcement learning, including the exploration-exploitation trade-off and basic exploration strategies like ε-greedy and Upper Confidence Bound (UCB) algorithms.
Intelligent Exploration in Deep Reinforcement Learning: A Review, Ran Zhao, Zheng Fang, Guanjun Liu, Chenbo Zhang, Yuanhao Li, and Wei Ding, 2021IEEE Transactions on Emerging Topics in Computational Intelligence, Vol. 5 (IEEE)DOI: 10.1109/TETCI.2021.3087265 - A comprehensive review of advanced exploration strategies specifically tailored for deep reinforcement learning, addressing challenges in large-scale and complex environments.
Curiosity-driven Exploration by Self-supervised Prediction, Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell, 2017Proceedings of the 34th International Conference on Machine Learning (ICML), Vol. 70 (PMLR) - Introduces a widely influential intrinsic motivation method that uses self-supervised prediction of features to generate an exploration bonus, guiding agents to novel and surprising states.
Near-Optimal Regret Bounds for Reinforcement Learning, Thomas Jaksch, Ronald Ortner, and Peter Auer, 2010Journal of Machine Learning Research, Vol. 11 - Presents UCRL2, a theoretically grounded algorithm that achieves near-optimal regret bounds for exploration in unknown finite Markov Decision Processes (MDPs) by applying the principle of optimism in the face of uncertainty.