Numerical Optimization, Jorge Nocedal and Stephen J. Wright, 2006 (Springer)DOI: 10.1007/978-0-387-40065-5 - Comprehensive coverage of the Hessian matrix, its properties, and its role in second-order optimization methods like Newton's method.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Discusses the Hessian matrix in the context of deep learning optimization, including its computational challenges and the use of Hessian-vector products.
Automatic Differentiation in Machine Learning: A Survey, Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind, 2018Journal of Machine Learning Research, Vol. 18DOI: 10.5555/3277526.3277543 - Surveys automatic differentiation techniques, detailing their application for computing gradients and Hessian-vector products, useful for large-scale machine learning.