Layer Normalization, Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton, 2016DOI: 10.48550/arXiv.1607.06450 - Introduces Layer Normalization, an alternative to Batch Normalization well-suited for recurrent neural networks and Transformers due to its independence from batch size.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive textbook that provides detailed explanations of deep learning optimization algorithms, regularization techniques, and practical training methods.