Variational Information Maximizing Exploration, Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, 2016Advances in Neural Information Processing Systems, Vol. 29 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1605.09674 - 介绍了变分信息最大化探索(VIME),一种深度强化学习方法,通过变分推断直接最大化关于环境动态模型的信息增益,用于探索。
Deep Exploration via Bootstrapped DQN, Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, 2016Advances in Neural Information Processing Systems 29 (Neural Information Processing Systems) - 描述了通过引导式DQN进行的深度探索,这是一种利用价值函数集成表示模型不确定性并指导探索的策略,作为信息增益的近似方法。
Bayesian Reinforcement Learning: A Survey, Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar, 2015Foundations and Trends in Machine Learning, Vol. 8 (Now Publishers)DOI: 10.1561/2200000049 - 全面回顾了贝叶斯强化学习,解释了概率模型和信念分布如何用于处理不确定性,这是信息增益探索的核心要素。