As we discussed in the chapter introduction, building a neural network that performs well on the data it was trained on is only the first step. Our primary objective is generalization, the model's ability to make accurate predictions on new, previously unseen data. Two common obstacles stand in the way of good generalization: underfitting and overfitting. Understanding these phenomena is essential for diagnosing training problems and improving your model's real-world performance.
Underfitting occurs when a model is too simple to capture the underlying structure of the data. It fails to learn the relevant patterns even in the training set, resulting in poor performance not just on new data, but also on the data it was trained on. Think of it as trying to draw a complex shape using only a straight ruler; you simply lack the right tool for the job.
Characteristics of Underfitting:
Common Causes:
If you observe high error rates on both your training and validation sets, your model is likely underfitting. The solution usually involves increasing model complexity, training longer, engineering better features, or reducing regularization.
Overfitting is the opposite problem. It occurs when a model learns the training data too well, capturing not only the underlying patterns but also the noise and random fluctuations specific to that particular dataset. The model essentially memorizes the training examples instead of learning a generalizable rule. This leads to excellent performance on the training set but poor performance on new, unseen data. Imagine a student who memorizes the answers to specific practice questions but hasn't grasped the underlying concepts needed to solve new problems.
Characteristics of Overfitting:
Common Causes:
Overfitting is a very common issue in neural network training. You can spot it when your training loss continues to decrease while your validation loss starts to plateau or even increase. This divergence indicates that the model is no longer learning generalizable patterns but is instead specializing in the training set specifics.
Underfitting and overfitting represent two extremes of the bias-variance tradeoff.
Ideally, we want a model with low bias and low variance. However, decreasing one often tends to increase the other. Very simple models have high bias and low variance (they are consistently wrong in the same way). Very complex models can have low bias but high variance (they fit the current data perfectly but generalize poorly). The goal of training and techniques like regularization is to find a sweet spot that balances this tradeoff, achieving good performance on unseen data.
A standard practice to monitor for underfitting and overfitting is to plot the model's loss (and/or accuracy) on both the training set and a separate validation set over the course of training epochs.
Comparison of training loss against different validation loss curves. Underfitting shows high loss for both. A good fit shows both converging at a low value. Overfitting shows validation loss increasing while training loss continues to decrease.
Observing these curves is fundamental:
By monitoring these metrics, you can diagnose whether your model is underfitting, overfitting, or achieving a good balance. The following sections in this chapter will detail specific techniques like regularization and early stopping to combat overfitting and improve your model's ability to generalize.
© 2025 ApX Machine Learning