Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - Introduces the Deep Q-Network (DQN) algorithm and the target network mechanism for stabilizing training.
Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A comprehensive textbook covering the theoretical foundations of reinforcement learning, including Q-learning and detailed discussions on DQN and its enhancements like target networks.
Continuous Control with Deep Reinforcement Learning, Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1509.02971 - Presents the Deep Deterministic Policy Gradient (DDPG) algorithm, which popularized the use of soft target updates (Polyak averaging), a method mentioned in the section.