As we discussed, the primary goal in training a deep learning model is not just to achieve high accuracy on the data it was trained on, but to perform well on new, unseen data. This ability is known as generalization. When a model fails to generalize effectively, it usually falls into one of two categories: underfitting or overfitting. Let's examine these concepts more closely.
Imagine trying to draw a straight line through a set of points that clearly follow a curve. The line simply isn't complex enough to capture the underlying pattern. This is the essence of underfitting.
An underfit model fails to capture the significant patterns present in the training data. It's often too simple, perhaps having insufficient capacity (too few layers or neurons) or not being trained for long enough.
Symptoms of Underfitting:
When a model underfits, it suggests that it hasn't learned the relevant relationships between the input features and the target output. It has high bias, meaning its assumptions about the data structure are too simplistic or incorrect. Increasing model complexity, adding more relevant features, or training longer might help alleviate underfitting.
Now, consider the opposite scenario. Imagine drawing a highly complex, wiggly line that passes exactly through every single point in your training set, including any random noise or outliers. While this line perfectly describes the training data, it's unlikely to represent the true underlying trend, and it will likely perform poorly when asked to predict new points. This is overfitting.
An overfit model learns the training data too well. It captures not only the underlying patterns but also the noise and random fluctuations specific to the training set. It essentially memorizes the training examples instead of learning the general principles governing the data.
Symptoms of Overfitting:
Overfitting often occurs when the model has too much capacity (it's too complex relative to the amount of training data) or when training goes on for too long. The model starts fitting the noise, leading to poor generalization. It has high variance, meaning its predictions are highly sensitive to the specific training data it saw. Techniques like regularization, getting more data, or using early stopping are common strategies to combat overfitting.
The relationship between training error and validation error over training epochs provides a useful diagnostic tool. We can visualize typical patterns for underfitting, overfitting, and a well-fitting model.
Comparison of error curves during training. Underfitting shows high error for both training (dashed blue) and validation (solid blue). Overfitting shows decreasing training error (dashed red) but increasing validation error (solid red) after some point. A good fit shows both errors decreasing and converging (green lines).
Finding the right balance between model complexity and the patterns in the data is fundamental. A model that is too simple (underfit) won't learn enough, while a model that is too complex (overfit) learns the wrong things (noise). The techniques discussed in the following chapters, namely regularization and optimization strategies, are designed to help navigate this balance and build models that generalize well to new data.
© 2025 ApX Machine Learning