Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive coverage of deep learning fundamentals, including network architecture, optimization challenges, and various deep learning models.
Deep Residual Learning for Image Recognition, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2015Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.1512.03385 - Introduces residual connections to enable training of extremely deep neural networks, directly mitigating vanishing gradients.
Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9DOI: 10.1162/neco.1997.9.8.1735 - Foundational paper introducing Long Short-Term Memory (LSTM) networks, designed to address vanishing and exploding gradients in recurrent neural networks.