Limits provide the mathematical framework to precisely define the instantaneous rate of change. This framework is then applied to formally define the derivative.
Imagine you have a function, let's say , that represents something you want to understand or optimize, perhaps the cost of a machine learning model based on a single parameter . You're interested in how the output changes when you slightly nudge the input .
Consider two points on the graph of : one at and another nearby at . The corresponding y-values are and . The change in is , and the change in is .
The average rate of change between these two points is the slope of the line connecting them (called a secant line):
This tells you, on average, how much changes per unit change in over the interval from to .
Now, what happens if we want to know the rate of change exactly at the point ? We can find this instantaneous rate of change by making the interval smaller and smaller, effectively bringing the second point closer and closer to the first. This is where limits come in. We take the limit of the average rate of change as approaches zero.
The derivative of a function with respect to , denoted as (read "f prime of x") or (Leibniz notation, read "dee y dee x"), is defined as the instantaneous rate of change of the function at point . Formally, it's the limit:
If this limit exists, we say the function is differentiable at . The value gives the precise rate at which the function is changing at the specific point .
Geometrically, the derivative represents the slope of the line tangent to the graph of at the point . The tangent line is the line that "just touches" the curve at that point and has the same direction as the curve at that point.
As , the secant line connecting and pivots and approaches the tangent line at . The slope of this limiting line is the derivative.
As the second point (controlled by ) gets closer to the first point on the curve , the slope of the secant line approaches the slope of the tangent line at . The slope of the tangent line is the derivative . For , , so .
In machine learning, we often work with cost functions (or loss functions) that measure how poorly our model is performing. Let's say is the cost function, where represents a parameter (like a weight) in our model. We want to adjust to minimize .
The derivative tells us how sensitive the cost is to small changes in the parameter .
This information about the direction and magnitude of change is exactly what optimization algorithms like gradient descent use to iteratively update model parameters and find the minimum cost. Understanding the derivative is therefore fundamental to understanding how machine learning models are trained.
In the following sections, we'll learn practical rules for calculating derivatives without having to go back to the limit definition every time, and we'll explore how to use derivatives to find the minimum and maximum points of functions.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with