Partial derivatives provide the means to understand how functions with multiple inputs, like f(x,y), change. For instance, ∂x∂f describes how f changes as x varies, keeping y fixed. Similarly, ∂y∂f indicates how f changes as y varies, while x is held constant.
But what if we want a single object that captures the rate of change with respect to all input variables simultaneously? That's where the gradient vector comes in.
The gradient of a function f is simply a vector where each component is a partial derivative of f. If our function has two inputs, x and y, its gradient is a two-dimensional vector. If it has n inputs, x1,x2,…,xn, its gradient is an n-dimensional vector.
We usually denote the gradient using the nabla symbol, ∇. For a function f(x,y), the gradient is written as ∇f or ∇f(x,y) and is defined as:
∇f(x,y)=[∂x∂f∂y∂f]
Sometimes you might also see it written horizontally using angle brackets: ∇f(x,y)=⟨∂x∂f,∂y∂f⟩.
For a function with n variables, f(x1,x2,…,xn), the gradient is:
∇f=∂x1∂f∂x2∂f⋮∂xn∂f
Essentially, the gradient packages up all the first-order partial derivatives into one convenient vector.
Example: Calculating a Gradient
Let's take the function f(x,y)=x2+5xy.
Find the partial derivatives:
Treat y as a constant to find ∂x∂f:
∂x∂f=∂x∂(x2)+∂x∂(5xy)=2x+5y
Treat x as a constant to find ∂y∂f:
∂y∂f=∂y∂(x2)+∂y∂(5xy)=0+5x=5x
Assemble the gradient vector:
∇f(x,y)=[∂x∂f∂y∂f]=[2x+5y5x]
This gradient, ∇f(x,y), gives us a vector that depends on the point (x,y) we are considering. For instance, at the point (1,2):
∇f(1,2)=[2(1)+5(2)5(1)]=[2+105]=[125]
At the point (−1,0):
∇f(−1,0)=[2(−1)+5(0)5(−1)]=[−2−5]
So, the gradient is itself a function that takes a point (x,y) as input and outputs a vector. This vector holds important information about how the function f behaves around that specific point, which we'll examine more closely in the next section. In machine learning, the gradient of a cost function tells us how to adjust model parameters (like the inputs x1,x2,…,xn) to change the cost.
Was this section helpful?
Calculus: Early Transcendentals, James Stewart, 2015 (Cengage Learning) - A widely used undergraduate textbook offering coverage of multivariable calculus, including partial derivatives and the gradient vector.
Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, 2020 (Cambridge University Press)DOI: 10.1017/9781108679989 - A textbook specifically providing the mathematical foundations for machine learning, with explanations of vector calculus and gradients.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook demonstrating the practical application of the gradient in optimization algorithms for training neural networks.
Multivariable Calculus, Denis Auroux, 2010 (MIT OpenCourseWare) - An MIT OpenCourseWare resource offering lecture videos, notes, and problem sets from a multivariable calculus course covering gradient vectors.