Okay, we've established that machine learning models are often represented as functions, and our goal is typically to optimize them, finding the inputs (parameters) that yield the best possible output (e.g., lowest error). But how do we systematically find these "best" parameters? Just trying random values is inefficient, especially when models have millions of parameters. We need a way to know which direction to adjust the parameters.
Imagine you're standing on a hillside in thick fog and want to get to the bottom (the lowest point). You can't see the whole landscape, but you can feel the slope of the ground right where you are. If the ground slopes down to your left, you'd take a step left. If it slopes down in front of you, you'd step forward. This "slope" tells you the direction of steepest descent.
In mathematics, the tool for measuring the instantaneous slope or rate of change of a function at a specific point is the derivative.
Think about a function f(x) that represents our model's cost (how bad it is) based on a single parameter x. We want to find the value of x that minimizes f(x). The derivative of f(x) with respect to x, often written as f′(x) or dxdf, tells us how much the output f(x) changes for a tiny change in the input x.
Geometrically, the derivative f′(a) gives the slope of the line tangent to the graph of y=f(x) at the point where x=a.
Consider the function f(x)=x2−2x+3. Let's visualize its slope at different points:
The plot shows the function f(x)=x2−2x+3. At x=−1, the tangent line (red dashed) slopes downward (negative derivative), indicating the function is decreasing. At x=1, the tangent line (yellow dashed) is horizontal (zero derivative), indicating a potential minimum. At x=3, the tangent line (green dashed) slopes upward (positive derivative), indicating the function is increasing.
This concept is directly applicable to optimizing machine learning models. Our "cost function" is the function we want to minimize. The model's parameters are the inputs to this function.
If we calculate the derivative of the cost function with respect to a specific parameter:
By calculating derivatives for all parameters, we know the "direction of steepest ascent" for the cost. To minimize the cost, we simply take a small step in the opposite direction. This is the core idea behind gradient descent, one of the most common optimization algorithms in machine learning, which we will explore in detail later.
For now, the important takeaway is that derivatives provide a quantitative measure of how a function's output changes with its input. This measure is essential for navigating the complex "landscape" of a model's cost function and iteratively adjusting parameters to find the optimal values. In the following chapters, we'll formalize the definition of the derivative, learn how to calculate derivatives for various functions, and extend these ideas to functions with many inputs (parameters), which is the typical scenario in machine learning.
© 2025 ApX Machine Learning