Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot, Yoshua Bengio, 2010Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Vol. 9 (PMLR)DOI: 10.5598/AISTATS.2010.53 - Introduces Xavier initialization, a method for setting initial weights to help maintain activation variance and mitigate vanishing/exploding gradients in deep networks.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides a comprehensive theoretical and practical discussion of parameter initialization techniques, including the motivations and challenges (see Chapter 8.4).