Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Provides a comprehensive academic overview of deep learning, including detailed explanations of internal covariate shift and its impact on optimization.
How Does Batch Normalization Help Optimization?, Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry, 2018Advances in Neural Information Processing Systems, Vol. 31 (Neural Information Processing Systems Foundation, Inc. (NeurIPS))DOI: 10.55917/fu.2018.528 - Investigates the underlying reasons for Batch Normalization's effectiveness, suggesting it primarily aids optimization by making the loss landscape smoother rather than solely reducing internal covariate shift.