Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Chapter 8 of this authoritative textbook provides a comprehensive overview of optimization challenges, including the characteristics of loss landscapes in deep learning.
Visualizing the Loss Landscape of Neural Networks, Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein, 2018Advances in Neural Information Processing Systems, Vol. 31 (Neural Information Processing Systems Foundation)DOI: 10.48550/arXiv.1712.09913 - This paper introduces methods for effectively visualizing the high-dimensional loss landscapes of neural networks, helping to illustrate concepts like sharp and flat minima.
Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, and Jorge Nocedal, 2018SIAM Review, Vol. 60 (Society for Industrial and Applied Mathematics)DOI: 10.1137/16M1080173 - This review article surveys a wide range of optimization methods, discussing their applicability and challenges in large-scale machine learning, with specific attention to deep learning's non-convexity and high-dimensionality.