A Tutorial on Thompson Sampling, Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen, 2018Foundations and Trends in Machine Learning, Vol. 11 (Now Publishers)DOI: 10.1561/2200000070 - 一份全面的教程,涵盖了汤普森采样在强化学习等多种背景下的基础、理论和应用。
Deep Exploration via Bootstrapped DQN, Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, 2016Advances in Neural Information Processing Systems, Vol. 29 (NeurIPS) - 介绍了Bootstrapped DQN,这是一种用于深度强化学习的汤普森采样的实用且广泛使用的近似方法。