Variational Information Maximizing Exploration, Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, 2016Advances in Neural Information Processing Systems, Vol. 29 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1605.09674 - Presents Variational Information Maximizing Exploration (VIME), a deep reinforcement learning method that directly maximizes information gain about the environment's dynamics using variational inference for exploration.
Deep Exploration via Bootstrapped DQN, Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, 2016Advances in Neural Information Processing Systems 29 (Neural Information Processing Systems) - Describes Deep Exploration via Bootstrapped DQN, an exploration strategy that uses an ensemble of value functions to represent model uncertainty and direct exploration, acting as a proxy for information gain.
Bayesian Reinforcement Learning: A Survey, Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar, 2015Foundations and Trends in Machine Learning, Vol. 8 (Now Publishers)DOI: 10.1561/2200000049 - Offers a thorough review of Bayesian reinforcement learning, explaining how probabilistic models and belief distributions are used to handle uncertainty, a core element of information gain exploration.
PILCO: A Model-Based Reinforcement Learning Architecture for Fast and Robust Policy Search, Marc Peter Deisenroth and Carl Edward Rasmussen, 2011Proceedings of the 28th International Conference on Machine Learning (ICML), Vol. 15 (PMLR (Proceedings of Machine Learning Research))DOI: 10.5592/mlr.2011.v15.deisenroth11a - Presents PILCO, a model-based reinforcement learning algorithm using Gaussian processes to maintain a probabilistic dynamics model. This method tracks model uncertainty, which guides effective exploration by focusing on areas with less understanding, and assists robust policy optimization.