While cross-validation gives us a more reliable estimate of how a model might perform on unseen data, most machine learning models also have settings, called hyperparameters, that are not learned directly from the data but are set before the training process begins. Think of them as configuration knobs for the learning algorithm. Examples include the number of neighbors (n_neighbors
) in KNN, the regularization strength (C
) or kernel type (kernel
) in Support Vector Machines (SVM), or the depth of a decision tree. Finding the optimal values for these hyperparameters can significantly impact model performance.
Manually trying different combinations of hyperparameters, training the model, and evaluating it using cross-validation can be tedious and inefficient. Scikit-learn provides an automated way to perform this search: Grid Search.
It's important to distinguish between model parameters and hyperparameters:
k
in KNN, C
and gamma
in SVM, the learning rate in gradient descent.Grid search focuses on finding the best hyperparameters.
The idea behind grid search is straightforward:
GridSearchCV
Scikit-learn's GridSearchCV
class makes this process easy to implement. Let's break down how to use it.
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
We'll use the Iris dataset for this example. We also need to split our data into training and testing sets, as grid search should only be performed using the training data to avoid information leakage from the test set.
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
Choose the model (estimator) you want to tune. Here, we'll use SVC
(Support Vector Classifier). Then, define the param_grid
dictionary.
# Define the estimator
svm_model = SVC()
# Define the grid of hyperparameters to search
param_grid = {
'C': [0.1, 1, 10, 100], # Regularization parameter
'gamma': [1, 0.1, 0.01, 0.001], # Kernel coefficient for 'rbf'
'kernel': ['rbf', 'linear'] # Type of kernel
}
This grid specifies 4 values for C
, 4 values for gamma
, and 2 values for kernel
. GridSearchCV
will evaluate 4×4×2=32 different combinations of these hyperparameters. Note that the gamma
parameter is only used by the rbf
kernel, but GridSearchCV
is smart enough to handle this.
GridSearchCV
Create an instance of GridSearchCV
, passing the estimator, the parameter grid, the cross-validation strategy (cv
), and optionally a scoring metric.
# Instantiate GridSearchCV
# cv=5 means 5-fold cross-validation
# scoring='accuracy' specifies the metric to optimize
grid_search = GridSearchCV(estimator=svm_model,
param_grid=param_grid,
cv=5,
scoring='accuracy',
verbose=1, # Optional: prints progress
n_jobs=-1) # Optional: use all available CPU cores
estimator
: The model instance (svm_model
).param_grid
: The dictionary defining the hyperparameters to try (param_grid
).cv
: The cross-validation splitting strategy. An integer (like 5) specifies K-Fold cross-validation (or Stratified K-Fold for classification). You can also pass specific CV splitter objects.scoring
: The metric used to evaluate the performance of each hyperparameter combination. Common values include 'accuracy'
, 'precision'
, 'recall'
, 'f1'
for classification, and 'neg_mean_squared_error'
, 'r2'
for regression. If None
, the estimator's default scorer is used.verbose
: Controls the verbosity. Higher values output more messages.n_jobs
: The number of CPU cores to use for parallel processing. -1
typically means use all available cores, which can significantly speed up the search.GridSearchCV
Fit the GridSearchCV
object to the training data. This triggers the search process.
# Fit the grid search object to the training data
grid_search.fit(X_train, y_train)
This step can take some time, as it involves training and evaluating the model multiple times (number of combinations × number of CV folds). In our example, it's 32×5=160 model fits.
Once fitting is complete, GridSearchCV
stores the results in several useful attributes:
best_params_
: A dictionary containing the combination of hyperparameters that yielded the best mean cross-validation score.
# Print the best parameters found
print(f"Best Hyperparameters: {grid_search.best_params_}")
best_score_
: The mean cross-validation score achieved with the best_params_
.
# Print the best cross-validation score
print(f"Best Cross-Validation Accuracy: {grid_search.best_score_:.4f}")
best_estimator_
: An estimator instance that has been automatically refit on the entire training dataset (X_train
, y_train
) using the best_params_
. This is the final model you'll typically use for predictions on new data (like the test set).
# Get the best estimator
best_svm_model = grid_search.best_estimator_
# Evaluate the best model on the test set
y_pred = best_svm_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test Set Accuracy with Best Model: {test_accuracy:.4f}")
cv_results_
: A dictionary containing detailed information about all the combinations evaluated during the grid search. This can be useful for more in-depth analysis, often converted to a Pandas DataFrame for easier inspection.
import pandas as pd
# Display detailed results (optional)
cv_results_df = pd.DataFrame(grid_search.cv_results_)
# print(cv_results_df[['param_C', 'param_gamma', 'param_kernel', 'mean_test_score', 'rank_test_score']].sort_values('rank_test_score').head())
RandomizedSearchCV
(which samples a fixed number of combinations randomly) or more advanced Bayesian optimization techniques can be more efficient alternatives, though GridSearchCV
is often a good starting point.By using GridSearchCV
, you can systematically explore different hyperparameter settings for your models, leveraging cross-validation to find the configuration that performs best on average for unseen data, leading to more robust and better-performing machine learning solutions. This process is a standard and valuable step in building effective models.
© 2025 ApX Machine Learning