Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - This comprehensive textbook covers foundational concepts in deep learning, including optimization, regularization, and common training challenges such as overfitting, underfitting, and gradient issues.
Deep Residual Learning for Image Recognition, Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, 2016Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR.2016.90 - Introduces residual networks, an architectural innovation that effectively addresses the vanishing gradient problem in deep neural networks, enabling the training of models with many layers.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Sergey Ioffe and Christian Szegedy, 2015Proceedings of the 32nd International Conference on Machine Learning (ICML) - Presents Batch Normalization, a technique to normalize layer inputs across mini-batches, which stabilizes training, speeds up convergence, and mitigates vanishing/exploding gradients.
On the difficulty of training recurrent neural networks, Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio, 2013International Conference on Machine Learning (ICML) - A foundational work that details the issues of exploding and vanishing gradients in deep networks and proposes gradient clipping as a method to counter exploding gradients.
Convolutional Neural Networks for Visual Recognition (CS231n) Lecture Notes, Stanford University (Course Staff), 2023 - These widely recognized online course notes provide practical guidance on training, debugging, and monitoring deep learning models, including discussions on common pitfalls and best practices.