Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Classic textbook with a dedicated chapter explaining optimization challenges in deep learning models.
Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization, Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio, 2014Advances in Neural Information Processing Systems, Vol. 27 (Advances in Neural Information Processing Systems) - Foundational paper discussing the prevalence of saddle points over local minima in high-dimensional non-convex optimization.
How to Escape Saddle Points Efficiently, Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, and Michael I. Jordan, 2017International Conference on Machine Learning (ICML), Vol. 70 (Proceedings of Machine Learning Research (PMLR))DOI: 10.5555/3305890.3306000 - This work presents algorithms and theoretical guarantees for efficiently escaping saddle points in non-convex optimization.
Optimization Algorithms for Deep Learning, Xiangxiang Zhang, Anna Choromanska, and Yann LeCun, 2019arXiv preprint arXiv:1904.12260 - Comprehensive review of optimization techniques and associated challenges in deep learning.