In the previous chapter, we learned that the derivative of a function, f′(x), tells us the instantaneous rate of change, or the slope of the tangent line, at any given point x. Now, let's see how this concept of slope helps us find specific points of interest on a function's graph: the peaks and valleys.
Think about walking along a hilly path represented by a function f(x). When you reach the very top of a hill (a maximum point) or the very bottom of a valley (a minimum point), for a brief moment, the path under your feet is level, or horizontal. A horizontal path has a slope of zero.
This intuition directly translates to calculus. If a smooth function f(x) reaches a local maximum or a local minimum at a point x=c, and the derivative exists at that point, then the slope of the tangent line at that point must be zero. Mathematically, we write this as:
f′(c)=0
Points where the derivative is zero (or undefined, though we'll focus on zero for now) are called critical points. These critical points are the candidates for where local maximums or minimums might occur.
Consider the function f(x)=x2−4x+5. This is a parabola. Let's find its derivative using the power rule and sum/constant rules from the previous chapter:
f′(x)=dxd(x2−4x+5)=2x−4+0=2x−4
Now, let's find where the slope is zero by setting the derivative equal to zero:
f′(x)=0 2x−4=0 2x=4 x=2
This tells us that at x=2, the slope of the function is zero. Let's look at the graph:
The plot shows the parabola f(x)=x2−4x+5. The red dot marks the minimum point at (2,1). The dashed red line is the tangent line at this point, which is horizontal, indicating a slope of f′(2)=0.
So, the general process to find potential locations of maximum or minimum values (the critical points) for a function f(x) involves these steps:
It's important to remember that finding f′(x)=0 only gives you the candidates. While these points often correspond to local maxima or minima, they could occasionally be other types of points (like inflection points where the curve flattens out momentarily before continuing in the same general direction). For our purposes in machine learning optimization, we are usually searching for a minimum, and the condition f′(x)=0 is the fundamental starting point for finding it.
This procedure of finding where the derivative is zero is not just a mathematical exercise. It forms the basis of how many machine learning algorithms are "trained".
Imagine you have a machine learning model, and you define a cost function (also called a loss function). This function measures how "bad" the model's predictions are compared to the actual data. A high value means the model is performing poorly; a low value means it's doing well.
The goal of training the model is to adjust its internal parameters to make the cost function as small as possible, ideally reaching its minimum value. If we can represent the cost as a function of the model's parameters, we can use calculus to find the parameter values that minimize this cost. Finding where the derivative of the cost function equals zero helps us locate this minimum error state, leading to the best possible model performance according to our chosen cost measure.
In the upcoming sections, we'll explore cost functions and the specific algorithm, Gradient Descent, which uses derivatives iteratively to find these minimum points, even when solving f′(x)=0 directly is difficult or impossible.
© 2025 ApX Machine Learning