In the previous section, we reviewed limits, which provide the mathematical machinery needed to precisely define the concept of an instantaneous rate of change. Now, let's use limits to formally define the derivative.
Imagine you have a function, let's say f(x), that represents something you want to understand or optimize, perhaps the cost of a machine learning model based on a single parameter x. You're interested in how the output f(x) changes when you slightly nudge the input x.
Consider two points on the graph of y=f(x): one at x and another nearby at x+h. The corresponding y-values are f(x) and f(x+h). The change in x is (x+h)−x=h, and the change in y is f(x+h)−f(x).
The average rate of change between these two points is the slope of the line connecting them (called a secant line):
Average Rate of Change=Change in xChange in y=hf(x+h)−f(x)
This tells you, on average, how much f changes per unit change in x over the interval from x to x+h.
Now, what happens if we want to know the rate of change exactly at the point x? We can find this instantaneous rate of change by making the interval h smaller and smaller, effectively bringing the second point closer and closer to the first. This is where limits come in. We take the limit of the average rate of change as h approaches zero.
The derivative of a function f(x) with respect to x, denoted as f′(x) (read "f prime of x") or dxdy (Leibniz notation, read "dee y dee x"), is defined as the instantaneous rate of change of the function at point x. Formally, it's the limit:
f′(x)=limh→0hf(x+h)−f(x)
If this limit exists, we say the function f is differentiable at x. The value f′(a) gives the precise rate at which the function f(x) is changing at the specific point x=a.
Geometrically, the derivative f′(a) represents the slope of the line tangent to the graph of y=f(x) at the point (a,f(a)). The tangent line is the line that "just touches" the curve at that point and has the same direction as the curve at that point.
As h→0, the secant line connecting (x,f(x)) and (x+h,f(x+h)) pivots and approaches the tangent line at (x,f(x)). The slope of this limiting line is the derivative.
As the second point (controlled by h) gets closer to the first point (1,1) on the curve f(x)=x2, the slope of the secant line approaches the slope of the tangent line at x=1. The slope of the tangent line is the derivative f′(1). For f(x)=x2, f′(x)=2x, so f′(1)=2.
In machine learning, we often work with cost functions (or loss functions) that measure how poorly our model is performing. Let's say J(w) is the cost function, where w represents a parameter (like a weight) in our model. We want to adjust w to minimize J(w).
The derivative dwdJ tells us how sensitive the cost J is to small changes in the parameter w.
This information about the direction and magnitude of change is exactly what optimization algorithms like gradient descent use to iteratively update model parameters and find the minimum cost. Understanding the derivative is therefore fundamental to understanding how machine learning models are trained.
In the following sections, we'll learn practical rules for calculating derivatives without having to go back to the limit definition every time, and we'll explore how to use derivatives to find the minimum and maximum points of functions.
© 2025 ApX Machine Learning