Hyperparameter Tuning: Understanding Grid Search

Bala Priya C - Apr 14 '21 - - Dev Community

In the previous blog post, Role of Cross-Validation, we had looked at how train/test split doesn’t suffice to get a reliable estimate of the out-of-sample accuracy which motivated us to understand the importance of cross-validation.

In this blog post, we shall see how we can do hyperparameter search more conveniently with GridSearchCV in scikit-learn, and learn how to use it to tune multiple hyperparameters.Let’s get started!

Time for a quick recall!

We were looking at classification problem on the iris dataset; We had used 10-fold cross-validation and the mean value of the cross-validated accuracy scores as the estimate of out-of-sample accuracy.☑
Let’s quickly walk through the steps again.

# necessary imports
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
%matplotlib inline
Enter fullscreen mode Exit fullscreen mode
# read in the iris data
iris = load_iris()

# create X (features) and y (response)
X = iris.data
y = iris.target
Enter fullscreen mode Exit fullscreen mode
# 10-fold cross-validation with K=5 for KNN (the n_neighbors parameter)
knn = KNeighborsClassifier(n_neighbors=5)
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
# use average accuracy as an estimate of out-of-sample accuracy
print(scores.mean())
# Output
0.9666666666666668
Enter fullscreen mode Exit fullscreen mode

We then wanted to identify the best value of K for the KNN classifier. To find the best K, we looped through a possible set of values for K (n_neighbors) and then picked the value of K that gave the highest cross-validated accuracy.

In KNN classifier, setting a very small value for K will make the model needlessly complex, and a very large value of K would result in a model with high bias, that yields suboptimal performance.

# search for an optimal value of K for KNN
k_range = list(range(1, 31))
k_scores = []
for k in k_range:
   knn = KNeighborsClassifier(n_neighbors=k)
   scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
   k_scores.append(scores.mean())
# plot the value of K for KNN (x-axis) versus the cross-validated accuracy (y-axis)
plt.plot(k_range, k_scores)
plt.xlabel('Value of K for KNN')
plt.ylabel('Cross-Validated Accuracy')

Enter fullscreen mode Exit fullscreen mode

Alt Text

As K=13,18 and 20 gave the highest accuracy score, close to 0.98, we decided to choose K=20 as a larger value of K would yield a less complex model. While it is not particularly difficult to write the for loop, we do realize that we may have to do it often.
Therefore, it would be good to have a more convenient way of doing the hyperparameter search, without having to write a loop every time and identify the best parameter through inspection.

Understanding GridSearchCV

Let’s go ahead and import the GridSearchCV class.

from sklearn.model_selection import GridSearchCV
Enter fullscreen mode Exit fullscreen mode

Define the Parameter Grid

We now define the parameter grid (param_grid), a Python dictionary, whose key is the name of the hyperparameter whose best value we’re trying to find and the value is the list of possible values that we would like to search over for the hyperparameter.

# define the parameter values that should be searched
k_range = list(range(1, 31))
# create a parameter grid: map the parameter names to the values that should be searched
param_grid = dict(n_neighbors=k_range)
print(param_grid)


# param_grid
{'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]}
Enter fullscreen mode Exit fullscreen mode

We now instantiate GridSearchCV . Note that we specify the param_grid instead of n_neighbors argument that we had specified for cross_val_score earlier. Why is this valid?

Remember, the parameter grid, param_grid is a dictionary whose key is n_neighbors and the value is a list of possible values of n_neighbors. Therefore, specifying the param_grid ensures that the value at index i is fetched as the value of n_neighbors in the i_th run.

Instantiate, fit grid and view results

# instantiate the grid
grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False)
Enter fullscreen mode Exit fullscreen mode

We now go ahead and fit the grid with data, and access the cv_results_ attribute to get the mean accuracy score after 10-fold cross-validation, standard deviation and the parameter values. For convenience, we may store the results in a pandas DataFrame. The mean and standard deviation of the accuracy scores for n_neighbors =1 to 10 are shown below.

# fit the grid with data
grid.fit(X, y)
# view the results as a pandas DataFrame
import pandas as pd
pd.DataFrame(grid.cv_results_)[['mean_test_score', 'std_test_score', 'params']]

# Output

mean_test_score std_test_score  params
0   0.960000    0.053333    {'n_neighbors': 1}
1   0.953333    0.052068    {'n_neighbors': 2}
2   0.966667    0.044721    {'n_neighbors': 3}
3   0.966667    0.044721    {'n_neighbors': 4}
4   0.966667    0.044721    {'n_neighbors': 5}
5   0.966667    0.044721    {'n_neighbors': 6}
6   0.966667    0.044721    {'n_neighbors': 7}
7   0.966667    0.044721    {'n_neighbors': 8}
8   0.973333    0.032660    {'n_neighbors': 9}
9   0.966667    0.044721    {'n_neighbors': 10}
Enter fullscreen mode Exit fullscreen mode

When using cross_val_score, we tried eyeballing the accuracy scores to identify the best hyperparameters and to make it easier, we plotted the value of hyperparameters vs the respective cross-validated accuracy scores!

Sounds good but doesn’t seem to be a great option!

Once we’ve completed the grid search, the following attributes can be very useful!
We can choose to examine:
☑ the best_score_ , the highest cross-validated accuracy score
☑ the best_params_, the optimal value for the hyperparameters, and
☑ the best_estimator_, which is the best model that has the best hyperparameter.
Let us now examine these for our example.

# examine the best model
print(grid.best_score_)
print(grid.best_params_)
print(grid.best_estimator_)

# Output
0.9800000000000001

{'n_neighbors': 13}

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=13, p=2,
                     weights='uniform')
Enter fullscreen mode Exit fullscreen mode

K=13 has been chosen, remember, K=13 was one of the values of K that gave highest cross-validated accuracy score.✔
So far so good!


Searching for multiple parameters

In this example, the only hyperparameter that we searched for was n_neighbors. What if there were many such hyperparameters?

We may think, “Why not tune each hyperparameter independently?”
Well, we may independently search for the optimal values for each of the hyperparameters; but the model may perform best at some values of the parameters that are very different from the individual best values. So, we have to search for the combination of the parameters that optimizes performance rather than the individual best parameters.

Let us build on the same example of KNNClassifier, in addition to n_neighbors, let us also search for the optimal weighting strategy.
The default weighting option is ‘uniform’ where all points are weighted equally and ‘distance’ option weights points by the inverse of their distance. In this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

# define the parameter values that should be searched
k_range = list(range(1, 31))
weight_options = ['uniform', 'distance']
# create a parameter grid: map the parameter names to the values that should be searched
param_grid = dict(n_neighbors=k_range, weights=weight_options)
print(param_grid)

# param_grid
{'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],

'weights': ['uniform', 'distance']}
Enter fullscreen mode Exit fullscreen mode

Let us instantiate and fit grid and view results, as before.

# instantiate and fit the grid
grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False) grid.fit(X, y)
# view the results
pd.DataFrame(grid.cv_results_)[['mean_test_score', 'std_test_score', 'params']]

# Results
mean_test_score std_test_score  params
0   0.960000    0.053333    {'n_neighbors': 1, 'weights': 'uniform'}
1   0.960000    0.053333    {'n_neighbors': 1, 'weights': 'distance'}
2   0.953333    0.052068    {'n_neighbors': 2, 'weights': 'uniform'}
3   0.960000    0.053333    {'n_neighbors': 2, 'weights': 'distance'}
4   0.966667    0.044721    {'n_neighbors': 3, 'weights': 'uniform'}
5   0.966667    0.044721    {'n_neighbors': 3, 'weights': 'distance'}
6   0.966667    0.044721    {'n_neighbors': 4, 'weights': 'uniform'}
7   0.966667    0.044721    {'n_neighbors': 4, 'weights': 'distance'}
8   0.966667    0.044721    {'n_neighbors': 5, 'weights': 'uniform'}
9   0.966667    0.044721    {'n_neighbors': 5, 'weights': 'distance'}
10  0.966667    0.044721    {'n_neighbors': 6, 'weights': 'uniform'}
11  0.966667    0.044721    {'n_neighbors': 6, 'weights': 'distance'}
12  0.966667    0.044721    {'n_neighbors': 7, 'weights': 'uniform'}
13  0.966667    0.044721    {'n_neighbors': 7, 'weights': 'distance'}
14  0.966667    0.044721    {'n_neighbors': 8, 'weights': 'uniform'}
15  0.966667    0.044721    {'n_neighbors': 8, 'weights': 'distance'}
16  0.973333    0.032660    {'n_neighbors': 9, 'weights': 'uniform'}
17  0.973333    0.032660    {'n_neighbors': 9, 'weights': 'distance'}
18  0.966667    0.044721    {'n_neighbors': 10, 'weights': 'uniform'}
Enter fullscreen mode Exit fullscreen mode

Results of Grid Search for two parameters (Truncated View of the DataFrame)
Only the first ten rows of the results have been shown above; We actually have 30*2=60 models (as we had 30 possible values for n_neighbors and 2 possible values for weights) and as we chose 10-fold cross-validation, there will be 60*10=600 predictions made! Time to look at our model’s best score and parameters that yielded the best score.

# examine the best model
print(grid.best_score_)
print(grid.best_params_)

# best score and best parameters
0.9800000000000001
{'n_neighbors': 13, 'weights': 'uniform'}
Enter fullscreen mode Exit fullscreen mode

We obtain the same best cross-validated accuracy score of 0.98, with n_neighbors=13 and weights= ‘uniform’. Now, let us say we have to tune 4 hyperparameters and we have a list of 10 possible values for each of the hyperparameters. This process creates 10*10*10*10 =10,000 models and when we run 10 fold cross-validation, there are 100,000 predictions made. Clearly, things do scale up very quickly and can soon become computationally infeasible.

More efficient hyperparameter searches such as Randomized Search and Informed Search can be very useful in overcoming this drawback. Let us cover them in a subsequent blog post.
Until then, Happy Learning!


References

[1] Here’s the link to the Google Colab notebook for the above used example.
[2] Introduction to Machine Learning in Python with scikit-learn by DataSchool.
[3] Scikit-learn documentation: GridSearchCV
http://scikitlearn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

Cover Image: Photo by Kelly Sikkema on Unsplash

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .