Okay, you've learned how to calculate partial derivatives for functions with multiple inputs, like f(x,y). We found ∂x∂f, which tells us how f changes as x changes (keeping y fixed), and ∂y∂f, which tells us how f changes as y changes (keeping x fixed).
But what if we want a single object that captures the rate of change with respect to all input variables simultaneously? That's where the gradient vector comes in.
The gradient of a function f is simply a vector where each component is a partial derivative of f. If our function has two inputs, x and y, its gradient is a two-dimensional vector. If it has n inputs, x1,x2,…,xn, its gradient is an n-dimensional vector.
We usually denote the gradient using the nabla symbol, ∇. For a function f(x,y), the gradient is written as ∇f or ∇f(x,y) and is defined as:
∇f(x,y)=[∂x∂f∂y∂f]Sometimes you might also see it written horizontally using angle brackets: ∇f(x,y)=⟨∂x∂f,∂y∂f⟩.
For a function with n variables, f(x1,x2,…,xn), the gradient is:
∇f=∂x1∂f∂x2∂f⋮∂xn∂fEssentially, the gradient packages up all the first-order partial derivatives into one convenient vector.
Let's take the function f(x,y)=x2+5xy.
Find the partial derivatives:
Assemble the gradient vector:
∇f(x,y)=[∂x∂f∂y∂f]=[2x+5y5x]This gradient, ∇f(x,y), gives us a vector that depends on the point (x,y) we are considering. For instance, at the point (1,2):
∇f(1,2)=[2(1)+5(2)5(1)]=[2+105]=[125]At the point (−1,0):
∇f(−1,0)=[2(−1)+5(0)5(−1)]=[−2−5]So, the gradient is itself a function that takes a point (x,y) as input and outputs a vector. This vector holds important information about how the function f behaves around that specific point, which we'll examine more closely in the next section. In machine learning, the gradient of a cost function tells us how to adjust model parameters (like the inputs x1,x2,…,xn) to change the cost.
© 2025 ApX Machine Learning