A Reinforcement Learning Algorithm for MDPs with Large State Spaces Based on Interval Estimation, Alexander L. Strehl, Michael L. Littman, 2008Advances in Neural Information Processing Systems, Vol. 21 (Neural Information Processing Systems Foundation, Inc.) - Introduces Model-Based Interval Estimation with Exploration Bonuses (MBIE-EB), a foundational algorithm that formalizes count-based exploration using "optimism in the face of uncertainty" for tabular and potentially large state spaces.
Unifying Count-Based Exploration and Intrinsic Motivation, Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos, 2016Advances in Neural Information Processing Systems (NIPS)DOI: 10.48550/arXiv.1606.01868 - A seminal paper proposing the use of density models (like PixelCNN) to estimate pseudo-counts for effective exploration in deep reinforcement learning, connecting it with intrinsic motivation.