Deep Residual Learning for Image Recognition, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2016Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.1512.03385 - This paper introduced Residual Networks (ResNets) and the concept of residual connections, which significantly advanced the ability to train deep neural networks by addressing gradient degradation.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A standard textbook for deep learning that explains the challenges of training deep networks, such as vanishing/exploding gradients, and architectural solutions like skip connections.