Understanding gradient descent is helpful, but seeing it in action makes the idea much clearer. Let's visualize how taking steps based on the derivative helps us find the minimum point of a function.
Imagine our cost function is like a valley or a bowl shape. Our goal is to find the very bottom of this valley. Think of yourself standing somewhere on the slope of this valley. How do you get to the bottom? You'd look at the ground beneath your feet to see which way is downhill and take a step in that direction.
The derivative, f′(x), gives us exactly this information: the slope of the function at our current position, x.
Gradient descent uses this slope information to take iterative steps. At each step, it calculates the derivative at the current point xold and updates the position using the rule:
xnew=xold−η×f′(xold)Here, η (eta) is the learning rate, a small positive number we discussed earlier that controls how big each step is. Notice the minus sign:
In both cases, the update moves x in the downhill direction.
Let's visualize this with a simple quadratic function, like f(x)=(x−3)2+2. We know its minimum occurs at x=3. The derivative is f′(x)=2(x−3). Let's start at x=0 and use a learning rate η=0.2.
The blue line shows the function f(x)=(x−3)2+2. The orange points and dotted line show the path taken by gradient descent starting at x=0. Each step moves closer to the minimum at x=3.
In the chart above:
This visualization shows the core idea: gradient descent repeatedly uses the derivative (the slope) to determine the direction of the next step, iteratively moving towards a minimum point of the function. In machine learning, the function being minimized is the cost function, and the position x represents the parameters of the model. By finding the parameters that minimize the cost, we find the model that best fits the data.
© 2025 ApX Machine Learning