Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Covers fundamental optimization concepts and algorithms used in machine learning, including cost functions and gradient descent, within the context of deep learning.
An Introduction to Statistical Learning: With Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, 2013 (Springer)DOI: 10.1007/978-1-4614-7138-7 - Provides an accessible introduction to statistical learning methods, explaining optimization through examples like linear regression and the minimization of the residual sum of squares. (1st edition)
CS229 Lecture Notes: Supervised Learning, Andrew Ng, 2018 (Stanford University) - Specifically covers linear regression, cost functions (e.g., MSE), and the introduction of gradient descent as an optimization method.