Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Offers a thorough theoretical explanation for optimization algorithms, including the derivation and operation of gradient descent, applicable to many machine learning models.