Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Introduces optimization algorithms, loss functions, and the role of derivatives/gradients in training machine learning models, especially deep neural networks.
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer) - Provides a comprehensive foundation for machine learning, with detailed explanations of loss functions, model training, and the mathematical principles of optimization for various algorithms.
Convex Optimization, Stephen Boyd, Lieven Vandenberghe, 2004 (Cambridge University Press) - A foundational textbook covering the theory and applications of optimization, explaining problem formulation and the mathematical properties of minima and maxima in functions.
CS229 Lecture Notes: Introduction to Machine Learning, Andrew Ng, Tengyu Ma, 2023 (Stanford University) - Covers the fundamental principles of machine learning, including the definition of cost functions and the application of gradient descent for model optimization.