Human-level control through deep reinforcement learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, 2015Nature, Vol. 518DOI: 10.1038/nature14236 - Presents the foundational Deep Q-Network (DQN) architecture, applying convolutional neural networks to image observations for discrete action control.
Dueling Network Architectures for Deep Reinforcement Learning, Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas, 2016Proceedings of the 33rd International Conference on Machine Learning (ICML), Vol. 48DOI: 10.48550/arXiv.1511.06581 - Introduces the Dueling Network Architecture, which partitions the network to estimate state value and action advantages separately for value-based deep reinforcement learning.
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017DOI: 10.48550/arXiv.1707.06347 - Proposes the Proximal Policy Optimization (PPO) algorithm and illustrates network architectures suitable for actor-critic methods, including designs for continuous action spaces and shared feature layers.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Offers a comprehensive overview of deep learning, covering foundational neural network architectures (MLPs, CNNs, RNNs), activation functions, and regularization methods relevant for deep reinforcement learning.