Okay, we've set up our simple linear regression model, y=mx+b, and defined its cost function, J(m,b), which measures how well the line fits the data. We also know how to calculate the gradient of this cost function: the vector containing the partial derivatives ∂m∂J and ∂b∂J. These derivatives tell us how the cost changes as we slightly adjust m or b.
Now, how do we actually use these gradients to improve our model parameters and minimize the cost? This is where the gradient descent algorithm comes into play.
Think of the cost function J(m,b) as defining a surface, perhaps like a landscape with hills and valleys. Our goal is to find the lowest point in this landscape, the minimum cost. The parameters m and b define our current location on this surface.
The gradient, ∇J=[∂m∂J,∂b∂J], at our current location points in the direction of the steepest ascent, the quickest way uphill. Since we want to minimize the cost, we need to go downhill. Therefore, we take a step in the direction opposite to the gradient.
This leads us to the core update rule for gradient descent. For each parameter, we adjust its current value by subtracting a small amount proportional to its partial derivative:
Here, mold and bold are the parameter values before the update step, and mnew and bnew are the values after the update step. The gradients ∂m∂J and ∂b∂J are calculated using the current values (mold, bold).
Notice the symbol α (alpha) in the update rules. This is the learning rate, which we'll discuss in the next section. For now, think of it as controlling the size of the step we take downhill. It's a small positive number (e.g., 0.01, 0.1).
This single calculation and update for both m and b constitutes one step of gradient descent.
The diagram shows the flow for a single step in gradient descent. Start with the current model parameters, calculate the cost function's gradient using those parameters, and then update the parameters by moving slightly in the opposite direction of the gradient. This process is typically repeated many times.
It's important to understand that gradient descent is an iterative algorithm. One step usually isn't enough to reach the minimum cost. We repeat this process calculating the gradient and updating the parameters many times. With each step, we (hopefully) move closer to the values of m and b that minimize the cost function, resulting in a better-fitting linear regression model.
This step, guided by the derivatives we calculated, is the fundamental mechanism by which many machine learning models learn from data. By repeatedly adjusting parameters in the direction that reduces error (cost), the model gradually improves its predictions.
© 2025 ApX Machine Learning