In my last post about training on multispectral images I was able to improve my classification accuracy on the EuroSat dataset from 75% to 90%, this using a very simple model. While 90% much better there are likelly further improvments possible by improving the model.
When seeking to improve the model there are two overall strategies one might attempt. Either one might try to analyze the problem and try to find a model that nicly adapts to that understanding, this is teoretically nice but the prommise of machine learning is that the computer can try to do some of this work for us. The other approach is to try different combination and see what works. Doing this manually is rather tedious so we can let the computer do this for us. This approach of letting the computer explore different ways of setting up our model is called hyperparameter optimization, where the hyperparameter part is reffereing to things like number of layers, choice of activation functions and the number of nodes per layer, this is to differentiate it from the ordinary parameters which are the weights of the model. The methods of analysis and hyperparameter optimization can of cource be combined, it can be very powerfull to alternate them and using analysis and understanding of the problem to create the broad strokes of the model and then use hyperparameter optimization to find the exact parameter values of that model.
In this article I will extend my earlier code for classifying the EuroSat dataset and show how you can use hyperparameter optimization to help find a better model. I will use a library called Talos to do this but it can be rather easily done without any dependecies as well if there is a reason to.
This article is intended to demonstrate how to write code for hyperparameter optimization, it is not intended to atempt to create an optimal solution to the classification of the EuroSat images, for much better results on that you can read the article EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification by Patrick Helber, Benjamin Bischke, Andreas Dengel and Damian Borth.
To implement hyperparameter optimization we need to define our parameter space and then tell our code to explore it, in our case using the talos library.
In this example we define the parameters with the following python code
params = {
'epoch': [120],
'batch_size': [32],
'activation': [relu],
# convultion layers in the begining
'conv_hidden_layers': [1,2,3],
'conv_depth_shape': ['brick'],
'conv_size_shape': ['brick'],
'conv_depth_first_neuron': [20,40,60],
'conv_depth_last_neuron': [20,40,60],
'conv_size_first_neuron': [5,7],
'conv_size_last_neuron': [3,5],
# fully connected layers at the end
'first_neuron': [32,64],
'last_neuron': [32,64],
'shapes': ['brick'],
'hidden_layers': [1,2,3],
'dropout': [0.05]
}
And then pass it to talos like this
verbose=True
round_limit=30 # NOTE Set this to however many rounds you want to test with
model=create_model(training_set,validation_set,verbose)
dummyX,dummyY=training_set.__getitem__(0)
testX,testY=validation_set.__getitem__(0)
validation_set.on_epoch_end()
tt = talos.Scan( x=dummyX
,y=dummyY
,params=params
,model=model
,x_val=testX
,y_val=testY
,experiment_name='example.csv'
,print_params=True
,round_limit=round_limit
)
Note the call to the create_model function. This is where we have moved our keras model definition and it is defined like this
class __Config__(object):
pass
config=__Config__()
config.optimizer=Adam
config.optimizer_parameters={'lr':0.0001,'decay':0.001}
config.loss='categorical_crossentropy'
config.metric=['accuracy']
def create_model(training_set,validation_set,verbose=False):
def _create_conv_shape_(params):
def shape(params):
if params['hidden_layers']==1:
return [params['first_neuron']]
if params['hidden_layers']==2:
return [params['first_neuron'],params['last_neuron']]
else:
params=params.copy()
params['hidden_layers']-=2
s_list=network_shape.network_shape(params,params['last_neuron'])
return [params['first_neuron'],*s_list,params['last_neuron']]
conv_depth_params={
'hidden_layers': params['conv_hidden_layers'],
'shapes': params['conv_depth_shape'],
'first_neuron': params['conv_depth_first_neuron'],
'last_neuron': params['conv_depth_last_neuron'],
}
conv_size_params={
'hidden_layers': params['conv_hidden_layers'],
'shapes': params['conv_size_shape'],
'first_neuron': params['conv_size_first_neuron'],
'last_neuron': params['conv_size_last_neuron'],
}
conv_depth_shape = shape(conv_depth_params)
conv_size_shape = shape(conv_size_params)
conv_shape=zip(conv_depth_shape,conv_size_shape)
return conv_shape
def model(dummyXtrain,dummyYtrain,dummyXval,dummyYval,params):
conv_shape=_create_conv_shape_(params)
model = Sequential()
for i,(depth,size) in enumerate(conv_shape):
if i==0:
model.add(Conv2D(depth, size, input_shape=training_set.shape))
else:
model.add(Conv2D(depth, size))
model.add(Activation('relu'))
model.add(Flatten())
hidden_layers(model, params, params['last_neuron'])
model.add(Dense(training_set.num_classes))
model.add(Activation('softmax'))
global config
optimizer=config.optimizer(**config.optimizer_parameters)
model.compile(loss=config.loss,
optimizer=optimizer,
metrics=config.metric)
training_set.batch_size=params['batch_size']
validation_set.batch_size=params['batch_size']
history = model.fit_generator(
training_set,
validation_data=validation_set,
epochs=params['epoch'],
verbose=int(params['verbose']),
)
return history,model
return model
While this is a bit more complex than our model definition was previously note that half the code is just to allow us to vary the number of of convolutional layers, something that talos at the time of writing this does not automate.
As this is running the training over and over again this code will take quite a while to run (for me it took overnight) I think it is prudent to save the results of the run to a file rather than printing them directly. Doing this allows us to analyze the output in new ways without having to rerun the code. To do this I have defined some helper functions to select the parts of talos output (stored in the tt object) to store and write that to disk. The call looks like this
t = project_object(tt,'params','saved_models','saved_weights','data','details','round_history')
save_object(t,'example.pickle')
And the helper functions are implemented like this
def save_object(obj, filename):
with open(filename, 'wb') as output:
pickle.dump(obj, output, protocol=2)
def project_object(obj,*attributes):
out={}
for a in attributes:
out[a]=getattr(obj,a)
return out
To analyze the output I made another script I called example_inspect.py with the following code
#!/usr/bin/env python3
import pickle
import pandas as pd
from prettytable import PrettyTable, PLAIN_COLUMNS
def save_object(obj, filename):
with open(filename, 'wb') as output:
pickle.dump(obj, output, protocol=2)
def load_object(filename):
with open(filename, 'rb') as f:
return pickle.load(f)
def print_hyperparameter_search_stats(t):
print(" *** params: ",{ p:(v if len(v)<200 else [v[0],v[1],v[2],'...',v[-1]]) for p,v in t['params'].items()})
print()
print(" *** data ",type(t['data']),len(t['data']))
print(t['data'].sort_values('val_acc',ascending=False).to_string())
print()
distinct_data=t['data']
nunique = distinct_data.apply(pd.Series.nunique)
cols_to_drop = nunique[nunique == 1].index
distinct_data = distinct_data.drop(cols_to_drop, axis=1)
print(nunique,cols_to_drop)
print(" *** distinct data ",type(distinct_data),len(distinct_data))
print(distinct_data.sort_values('val_acc',ascending=False).to_string())
print()
print(" *** details ",type(t['details']),len(t['details']))
print(t['details'])
print()
tt = load_object('example.pickle')
print(tt['details'])
for ttt in tt['round_history']:
table = PrettyTable()
table.set_style(PLAIN_COLUMNS)
iterations=max([len(x) for x in ttt.values()])
table.add_column('epoch',range(1,iterations+1))
for key,val in sorted(ttt.items()):
table.add_column(key, sorted(val))
print(table)
print_hyperparameter_search_stats(tt)
Running this gets us (among other things) the following table (sorted on val acc):
val loss | val acc | loss | acc | conv depth first neuron | conv depth last neuron | conv hidden layers | conv size first neuron | conv size last neuron | first neuron | hidden layers | last neuron |
---|---|---|---|---|---|---|---|---|---|---|---|
0.357977 | 0.910370 | 0.144818 | 0.95325 | 60 | 60 | 3 | 7 | 5 | 32 | 1 | 64 |
0.304265 | 0.902963 | 0.237622 | 0.92300 | 60 | 60 | 3 | 5 | 3 | 64 | 3 | 64 |
0.317458 | 0.901296 | 0.206647 | 0.93250 | 60 | 40 | 3 | 5 | 5 | 64 | 2 | 64 |
0.407395 | 0.900926 | 0.016828 | 0.99700 | 60 | 60 | 2 | 5 | 5 | 64 | 1 | 64 |
0.359424 | 0.898519 | 0.211113 | 0.93300 | 40 | 60 | 3 | 5 | 5 | 32 | 2 | 32 |
0.374605 | 0.888889 | 0.197840 | 0.93875 | 60 | 60 | 2 | 7 | 5 | 32 | 2 | 64 |
0.400512 | 0.879444 | 0.160062 | 0.94700 | 20 | 20 | 3 | 7 | 5 | 64 | 1 | 64 |
0.430621 | 0.876111 | 0.036848 | 0.99050 | 40 | 40 | 1 | 5 | 5 | 64 | 1 | 64 |
0.442248 | 0.873889 | 0.110187 | 0.96850 | 60 | 40 | 1 | 7 | 5 | 64 | 3 | 64 |
0.458507 | 0.872037 | 0.115722 | 0.96400 | 60 | 60 | 1 | 7 | 3 | 64 | 3 | 64 |
0.458475 | 0.868704 | 0.076465 | 0.97725 | 60 | 20 | 1 | 7 | 3 | 32 | 1 | 64 |
0.513024 | 0.867963 | 0.060912 | 0.98375 | 40 | 40 | 1 | 7 | 3 | 64 | 2 | 32 |
0.425103 | 0.864630 | 0.221249 | 0.92350 | 20 | 40 | 2 | 7 | 3 | 64 | 3 | 64 |
0.451033 | 0.863889 | 0.179650 | 0.94750 | 20 | 20 | 2 | 7 | 3 | 64 | 1 | 64 |
0.469304 | 0.862222 | 0.206574 | 0.93975 | 60 | 20 | 1 | 7 | 3 | 32 | 3 | 32 |
0.453625 | 0.861852 | 0.088379 | 0.97775 | 40 | 20 | 2 | 7 | 3 | 64 | 1 | 32 |
0.444886 | 0.861481 | 0.128833 | 0.96150 | 60 | 40 | 1 | 7 | 3 | 32 | 1 | 64 |
0.507459 | 0.858148 | 0.315459 | 0.90150 | 40 | 60 | 2 | 5 | 3 | 64 | 2 | 32 |
0.486624 | 0.852963 | 0.354109 | 0.89050 | 20 | 20 | 2 | 5 | 5 | 64 | 1 | 64 |
0.485049 | 0.851111 | 0.145122 | 0.96050 | 20 | 20 | 1 | 5 | 5 | 64 | 1 | 64 |
0.559950 | 0.849815 | 0.027219 | 0.99625 | 20 | 40 | 1 | 7 | 5 | 64 | 1 | 32 |
0.494588 | 0.847963 | 0.150461 | 0.95500 | 20 | 60 | 1 | 5 | 3 | 64 | 1 | 32 |
0.548905 | 0.824444 | 0.496872 | 0.83700 | 20 | 40 | 1 | 5 | 5 | 32 | 2 | 32 |
0.613613 | 0.807037 | 0.564718 | 0.80950 | 60 | 60 | 1 | 7 | 3 | 32 | 2 | 32 |
0.626898 | 0.798519 | 0.563555 | 0.81325 | 60 | 40 | 1 | 7 | 5 | 32 | 2 | 64 |
1.310021 | 0.495741 | 1.370861 | 0.46675 | 40 | 20 | 1 | 7 | 5 | 32 | 3 | 32 |
2.302581 | 0.114630 | 2.302587 | 0.09625 | 40 | 60 | 1 | 7 | 5 | 32 | 2 | 64 |
2.302576 | 0.114630 | 2.302588 | 0.10000 | 60 | 60 | 1 | 5 | 5 | 32 | 3 | 32 |
14.327196 | 0.111111 1 | 4.506286 | 0.10000 | 60 | 20 | 1 | 7 | 5 | 64 | 1 | 32 |
2.302568 | 0.110556 | 2.302587 | 0.09900 | 20 | 40 | 3 | 5 | 3 | 32 | 3 | 64 |
As can be seen from the above table our best result of this run is 91% validation accuracy, only a small improvement over the 90% we got earlier. This can be either because I was lucky with my initial guess of parameters or because we are changing the wrong parameters in the model in our experiment. As this is meant to demonstrate the technique I will not delve deeper into this in this article. Despite this I hope this gives you an overview of how hyperparameter optimization can be realized.
Our new source now looks like this (example.py) and the inspect code like this (example_inspect.py). The generators code is unchanged.
Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License
Comments