Bandit based Monte-Carlo Planning, Levente Kocsis and Csaba Szepesvári, 2006Machine Learning: ECML 2006, Lecture Notes in Computer Science, Vol. 4212 (Springer)DOI: 10.1007/11871842_29 - Presents the Upper Confidence bounds applied to Trees (UCT) algorithm, a core component of Monte Carlo Tree Search.
A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play, David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dmitry Kummer, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis, 2018Science, Vol. 362 (American Association for the Advancement of Science (AAAS))DOI: 10.1126/science.aar6404 - Describes the AlphaZero algorithm, demonstrating the integration of deep neural networks with MCTS for general game playing. This work is highly relevant to the 'Sophistications' section.