Bandit-based Monte-Carlo Planning, Levente Kocsis, Csaba Szepesvári, 2006Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Vol. 4212 (Springer)DOI: 10.1007/11871842_29 - Introduces the Upper Confidence bounds applied to Trees (UCT) algorithm, a core component for the Selection phase in MCTS.
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A comprehensive textbook on reinforcement learning, explaining MCTS within the context of planning and model-based RL.
Mastering the game of Go with deep neural networks and tree search, David Silver, Aja Huang, Marcin Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedran Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, 2016Nature, Vol. 529DOI: 10.1038/nature16961 - A landmark paper demonstrating the successful application of MCTS, combined with deep learning, to achieve superhuman performance in the game of Go (AlphaGo).
A Survey of Monte Carlo Tree Search Methods, Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton, 2012IEEE Transactions on Computational Intelligence and AI in Games, Vol. 4 (IEEE)DOI: 10.1109/TCIAIG.2012.2186810 - A comprehensive survey providing an overview of MCTS algorithms, their theoretical foundations, and diverse applications.