In the previous section, we talked about the general idea of optimization, finding the lowest or highest points of functions. In machine learning, this isn't just an abstract exercise. We optimize specific functions to make our models learn effectively from data. But what exactly are we trying to minimize or maximize?
Think about training a machine learning model. The goal is usually to make predictions that are as close as possible to the actual, real-world outcomes. For example, if we're predicting house prices, we want our model's price prediction to be very close to the price the house actually sold for. If we're classifying images, we want the model to assign the correct label (like "cat" or "dog").
We need a way to measure how "wrong" our model's predictions are compared to the true values. This measure is captured by a cost function, also often called a loss function or sometimes an objective function.
A cost function takes the model's predictions and the actual target values as inputs and outputs a single number. This number represents the cost or penalty for the errors the model is making on the data it's being trained on.
Imagine you're learning to play darts. Your cost function could be the average distance your darts land from the bullseye. A lower average distance (lower cost) means you're getting better. Similarly, in machine learning, we want to adjust the model to achieve the lowest possible cost.
One of the most common cost functions, especially for regression problems (like predicting prices), is the Mean Squared Error (MSE). Let's break down how it works for a dataset with N examples:
The formula for MSE is: MSE=N1∑i=1N(ypred,i−yactual,i)2
Here, ∑ (sigma) means "sum up," and i=1 to N tells us to sum for all data points from the first (i=1) to the last (i=N). N1 takes the average.
Our goal in training a model using MSE is to find the model parameters (like the slope m and intercept b in a linear model y=mx+b) that make this MSE value as small as possible.
It's helpful to think of the cost function as defining a surface or a landscape. The "location" on this landscape is determined by the specific values chosen for the model's parameters (like m and b). The "height" of the landscape at any location represents the value of the cost function for those parameters.
For a simple linear model y=mx+b trying to fit some data, the MSE cost depends on the chosen values of m and b. If we simplify and imagine fixing b and only changing m, we can plot the cost (MSE) against different values of m. This usually looks something like a bowl shape.
A typical shape for a cost function (like MSE) plotted against a single model parameter. The lowest point represents the parameter value that minimizes the error.
With multiple parameters (like m and b), this landscape becomes higher-dimensional (like a 3D bowl for two parameters), but the core idea remains the same: we are searching for the lowest point in this cost landscape. The coordinates of that lowest point give us the optimal parameter values for our model.
This brings us back to optimization. The cost function gives us the specific quantity we want to minimize. The techniques we are learning in this chapter, using derivatives, are the tools we use to find that minimum point efficiently.
© 2025 ApX Machine Learning