Throughout this chapter, we've focused on refining trained models, addressing issues like overfitting, and streamlining the training workflow. However, another significant aspect of optimizing model performance lies in selecting the right hyperparameters before training even begins. Unlike model parameters (weights and biases) that are learned from data, hyperparameters are configuration settings chosen by the practitioner. Getting these settings right can significantly impact training speed, convergence, and the final predictive power of your model.
Think of hyperparameters as the knobs and dials you set on your neural network building machine before you turn it on and start processing data. They define the model's architecture and control the learning process itself. Model parameters, in contrast, are the internal variables the machine adjusts automatically during operation (training) to minimize the loss function.
Common hyperparameters you've already encountered or will use include:
Choosing appropriate values for these hyperparameters is essential because there's rarely a single "best" set that works for all problems or datasets.
Finding the optimal combination of hyperparameters is often more art than science, and it presents several challenges:
This process of searching for the best set of hyperparameters is known as hyperparameter tuning or hyperparameter optimization.
While manual tweaking based on experience plays a role, more systematic approaches are generally preferred.
This is often the starting point. You make educated guesses based on common practices, prior experience, or published research, train the model, evaluate its performance, and then adjust the hyperparameters based on the results. For instance, if the model is overfitting, you might try increasing regularization strength or decreasing model complexity (fewer layers/neurons). If training is too slow or unstable, you might adjust the learning rate. While intuitive, this can be inefficient and may miss optimal combinations.
Grid Search is an exhaustive approach. You define a specific range or list of discrete values for each hyperparameter you want to tune. The algorithm then systematically trains and evaluates a model for every possible combination of these values.
For example, if you specify:
[0.01, 0.001, 0.0001]
[32, 64]
Grid Search would train and evaluate 3×2=6 different models.
A conceptual view of Grid Search exploring all combinations (blue dots) in a 2D hyperparameter space defined by Learning Rate and Batch Size.
While thorough for small search spaces, Grid Search becomes computationally intractable as the number of hyperparameters or the number of values per hyperparameter increases (suffering from the "curse of dimensionality").
Random Search offers a more efficient alternative. Instead of trying all combinations, it samples a fixed number of combinations randomly from the specified ranges or distributions for each hyperparameter.
Research (e.g., by Bergstra and Bengio) has shown that Random Search often finds better models than Grid Search within the same computational budget. This is because not all hyperparameters are equally important; Random Search has a higher chance of hitting good values for the important hyperparameters, whereas Grid Search spends much of its time exploring combinations where only unimportant parameters change.
Comparing how Grid Search samples points on a fixed grid versus how Random Search samples points more freely within the hyperparameter space. Random search might explore a wider range of values for each hyperparameter with the same number of trials.
You define ranges (e.g., learning rate between 0.0001 and 0.01) or distributions for your hyperparameters and specify the total number of combinations (n_iter
) to try.
Beyond Grid and Random Search, more sophisticated algorithms exist, such as:
These methods can be more efficient but are also more complex to implement and understand.
GridSearchCV
and RandomizedSearchCV
which can be adapted for Keras models, though KerasTuner often provides tighter integration.Hyperparameter tuning is an iterative process, often revisited as you gain more insights into your model and data. It's a key step in pushing your model's performance from acceptable to excellent.
© 2025 ApX Machine Learning