Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Gabriel Gimenez, Edward Lockhart, Nal Kalchbrenner, Andrew Irving, Edward Grefenstette, Demis Hassabis, and David Silver, 2020Nature, Vol. 588DOI: 10.1038/s41586-020-03051-4 - 阐述了一种基于模型的强化学习方法,该方法学习动态模型并利用蒙特卡洛树搜索进行规划,展示了在实现各种复杂环境中高性能的同时,其显著的计算需求。