We've learned that the derivative, f′(x), tells us the instantaneous rate of change of a function f(x). It measures how quickly the function's output is changing with respect to its input, represented geometrically as the slope of the tangent line.
But what if we want to know how the rate of change itself is changing? Just as we can find the rate of change of a function, we can also find the rate of change of its derivative. This leads us to the concept of higher-order derivatives.
The most common higher-order derivative you'll encounter, especially in optimization contexts relevant to machine learning, is the second derivative.
Simply put, the second derivative is the derivative of the first derivative. If you have a function f(x), you find its first derivative, f′(x). Then, you differentiate f′(x) again to get the second derivative.
Notation:
Just like the first derivative has multiple notations, so does the second derivative:
Calculation:
Calculating the second derivative involves applying the differentiation rules you've already learned, just one more time.
Let's take an example: Consider the function: f(x)=x3+2x2−5x+1
First, find the first derivative using the power rule, constant multiple rule, and sum rule: f′(x)=dxd(x3)+dxd(2x2)−dxd(5x)+dxd(1) f′(x)=3x3−1+2(2x2−1)−5(1x1−1)+0 f′(x)=3x2+4x−5
Now, to find the second derivative, f′′(x), we differentiate f′(x) with respect to x: f′′(x)=dxd(f′(x))=dxd(3x2+4x−5) f′′(x)=dxd(3x2)+dxd(4x)−dxd(5) f′′(x)=3(2x2−1)+4(1x1−1)−0 f′′(x)=6x+4
So, for f(x)=x3+2x2−5x+1, the first derivative is f′(x)=3x2+4x−5, and the second derivative is f′′(x)=6x+4.
The first derivative f′(x) tells us the slope of the function. The second derivative f′′(x) tells us how the slope is changing.
Think about velocity and acceleration. If f(x) represents the position of an object, then f′(x) represents its velocity (rate of change of position). Then f′′(x) represents its acceleration (rate of change of velocity). Positive acceleration means velocity is increasing; negative acceleration (deceleration) means velocity is decreasing.
The graph shows y=x2, which is always concave up (f′′(x)=2), and y=−x2, which is always concave down (f′′(x)=−2).
Can we keep going? Yes. We can find the derivative of the second derivative, which is called the third derivative, denoted f′′′(x) or dx3d3y. We could continue to find the fourth derivative, fifth derivative, and so on. These are collectively known as higher-order derivatives.
While these exist, the first and second derivatives are the most frequently encountered and practically useful derivatives in the context of machine learning fundamentals, particularly for optimization. The second derivative, as we'll hint at later, plays a role in confirming whether a point identified using the first derivative is actually a minimum (like the bottom of a valley in a cost function) or a maximum.
For now, the important idea is that we can differentiate a function multiple times, and the second derivative provides information about the function's curvature or concavity.
© 2025 ApX Machine Learning