Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Essential for understanding how the gradient is applied in machine learning optimization algorithms like gradient descent, providing the mathematical basis for model training.