Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A standard textbook covering essential concepts of deep learning, including detailed discussions on network architectures, design principles, and training challenges.
Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot, Yoshua Bengio, 2010Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 9 (PMLR)DOI: 10.5555/3104322.3104344 - This paper investigates the vanishing and exploding gradient problems in deep networks and proposes adaptive weight initialization schemes like Xavier initialization.