When you train a neural network, your primary objective is to minimize the loss function on the training dataset. The optimization algorithms we've discussed, like gradient descent and its variants, are designed precisely for this task. However, simply achieving a very low loss on the data the model was trained on doesn't guarantee good performance in a real-world application. This brings us to a common challenge in developing machine learning models: overfitting.
Overfitting occurs when a model learns the training data too well. Instead of capturing the underlying patterns and relationships in the data that generalize to new examples, the model starts memorizing the specific details and noise present only in the training set. Imagine studying for an exam by memorizing the exact answers to practice questions. You might ace the practice test, but you'd likely struggle with new questions covering the same concepts because you didn't learn the underlying principles. An overfitted model behaves similarly; it performs exceptionally well on the data it has already seen but fails to generalize to unseen data.
This leads to a situation where the model has low bias (it fits the training data closely) but high variance (its predictions change drastically with different training sets and it performs poorly on new data). The opposite problem, underfitting, occurs when the model is too simple to capture the underlying structure of the data, resulting in poor performance on both the training and validation sets (high bias). Our goal is usually to find a sweet spot between these two extremes.
Several factors can contribute to overfitting:
The most common way to detect overfitting is by monitoring the model's performance on both the training set and a separate validation set during the training process.
If the model is learning well and generalizing, both the training and validation loss should decrease, and the relevant metrics should improve. However, if overfitting starts to occur, you'll typically observe:
This divergence between training and validation performance is a clear indicator of overfitting. Visualizing the training and validation loss/metrics over epochs is a standard practice.
Typical learning curves showing the training loss consistently decreasing while the validation loss begins to increase after around epoch 25, indicating the onset of overfitting.
The ultimate goal of building a machine learning model is generalization: the ability to make accurate predictions on new, previously unseen data. Overfitting is a direct obstacle to achieving good generalization. Therefore, understanding, detecting, and mitigating overfitting are essential skills for any deep learning practitioner. The subsequent sections in this chapter will introduce several techniques specifically designed to combat overfitting and improve the generalization performance of your models.
© 2025 ApX Machine Learning