Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Authoritative textbook covering the foundations of deep learning, including comprehensive sections on generalization, regularization techniques (L1/L2, Dropout, Early Stopping, Data Augmentation), and optimization algorithms (Gradient Descent, SGD, Momentum, adaptive methods).
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, 2014Journal of Machine Learning Research, Vol. 15 - The seminal paper introducing Dropout, a widely used regularization technique for neural networks.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimization algorithm, which combines the benefits of AdaGrad and RMSprop, widely used in deep learning.