In my last post about training on multispectral images I was able to improve my classification accuracy on the EuroSat dataset from 75% to 90%, this using a very simple model. While 90% much better there are likelly further improvments possible by improving the model.

When seeking to improve the model there are two overall strategies one might attempt. Either one might try to analyze the problem and try to find a model that nicly adapts to that understanding, this is teoretically nice but the prommise of machine learning is that the computer can try to do some of this work for us. The other approach is to try different combination and see what works. Doing this manually is rather tedious so we can let the computer do this for us. This approach of letting the computer explore different ways of setting up our model is called hyperparameter optimization, where the hyperparameter part is reffereing to things like number of layers, choice of activation functions and the number of nodes per layer, this is to differentiate it from the ordinary parameters which are the weights of the model. The methods of analysis and hyperparameter optimization can of cource be combined, it can be very powerfull to alternate them and using analysis and understanding of the problem to create the broad strokes of the model and then use hyperparameter optimization to find the exact parameter values of that model.

In this article I will extend my earlier code for classifying the EuroSat dataset and show how you can use hyperparameter optimization to help find a better model. I will use a library called Talos to do this but it can be rather easily done without any dependecies as well if there is a reason to.

This article is intended to demonstrate how to write code for hyperparameter optimization, it is not intended to atempt to create an optimal solution to the classification of the EuroSat images, for much better results on that you can read the article EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification by Patrick Helber, Benjamin Bischke, Andreas Dengel and Damian Borth.

To implement hyperparameter optimization we need to define our parameter space and then tell our code to explore it, in our case using the talos library.

In this example we define the parameters with the following python code

params = { 'epoch': [120], 'batch_size': [32], 'activation': [relu], # convultion layers in the begining 'conv_hidden_layers': [1,2,3], 'conv_depth_shape': ['brick'], 'conv_size_shape': ['brick'], 'conv_depth_first_neuron': [20,40,60], 'conv_depth_last_neuron': [20,40,60], 'conv_size_first_neuron': [5,7], 'conv_size_last_neuron': [3,5], # fully connected layers at the end 'first_neuron': [32,64], 'last_neuron': [32,64], 'shapes': ['brick'], 'hidden_layers': [1,2,3], 'dropout': [0.05] }

And then pass it to talos like this

verbose=True round_limit=30 # NOTE Set this to however many rounds you want to test with model=create_model(training_set,validation_set,verbose) dummyX,dummyY=training_set.__getitem__(0) testX,testY=validation_set.__getitem__(0) validation_set.on_epoch_end() tt = talos.Scan( x=dummyX ,y=dummyY ,params=params ,model=model ,x_val=testX ,y_val=testY ,experiment_name='example.csv' ,print_params=True ,round_limit=round_limit )

Note the call to the create_model function. This is where we have moved our keras model definition and it is defined like this

class __Config__(object): pass config=__Config__() config.optimizer=Adam config.optimizer_parameters={'lr':0.0001,'decay':0.001} config.loss='categorical_crossentropy' config.metric=['accuracy'] def create_model(training_set,validation_set,verbose=False): def _create_conv_shape_(params): def shape(params): if params['hidden_layers']==1: return [params['first_neuron']] if params['hidden_layers']==2: return [params['first_neuron'],params['last_neuron']] else: params=params.copy() params['hidden_layers']-=2 s_list=network_shape.network_shape(params,params['last_neuron']) return [params['first_neuron'],*s_list,params['last_neuron']] conv_depth_params={ 'hidden_layers': params['conv_hidden_layers'], 'shapes': params['conv_depth_shape'], 'first_neuron': params['conv_depth_first_neuron'], 'last_neuron': params['conv_depth_last_neuron'], } conv_size_params={ 'hidden_layers': params['conv_hidden_layers'], 'shapes': params['conv_size_shape'], 'first_neuron': params['conv_size_first_neuron'], 'last_neuron': params['conv_size_last_neuron'], } conv_depth_shape = shape(conv_depth_params) conv_size_shape = shape(conv_size_params) conv_shape=zip(conv_depth_shape,conv_size_shape) return conv_shape def model(dummyXtrain,dummyYtrain,dummyXval,dummyYval,params): conv_shape=_create_conv_shape_(params) model = Sequential() for i,(depth,size) in enumerate(conv_shape): if i==0: model.add(Conv2D(depth, size, input_shape=training_set.shape)) else: model.add(Conv2D(depth, size)) model.add(Activation('relu')) model.add(Flatten()) hidden_layers(model, params, params['last_neuron']) model.add(Dense(training_set.num_classes)) model.add(Activation('softmax')) global config optimizer=config.optimizer(**config.optimizer_parameters) model.compile(loss=config.loss, optimizer=optimizer, metrics=config.metric) training_set.batch_size=params['batch_size'] validation_set.batch_size=params['batch_size'] history = model.fit_generator( training_set, validation_data=validation_set, epochs=params['epoch'], verbose=int(params['verbose']), ) return history,model return model

While this is a bit more complex than our model definition was previously note that half the code is just to allow us to vary the number of of convolutional layers, something that talos at the time of writing this does not automate.

As this is running the training over and over again this code will take quite a while to run (for me it took overnight) I think it is prudent to save the results of the run to a file rather than printing them directly. Doing this allows us to analyze the output in new ways without having to rerun the code. To do this I have defined some helper functions to select the parts of talos output (stored in the tt object) to store and write that to disk. The call looks like this

t = project_object(tt,'params','saved_models','saved_weights','data','details','round_history') save_object(t,'example.pickle')

And the helper functions are implemented like this

def save_object(obj, filename): with open(filename, 'wb') as output: pickle.dump(obj, output, protocol=2) def project_object(obj,*attributes): out={} for a in attributes: out[a]=getattr(obj,a) return out

To analyze the output I made another script I called example_inspect.py with the following code

#!/usr/bin/env python3 import pickle import pandas as pd from prettytable import PrettyTable, PLAIN_COLUMNS def save_object(obj, filename): with open(filename, 'wb') as output: pickle.dump(obj, output, protocol=2) def load_object(filename): with open(filename, 'rb') as f: return pickle.load(f) def print_hyperparameter_search_stats(t): print(" *** params: ",{ p:(v if len(v)<200 else [v[0],v[1],v[2],'...',v[-1]]) for p,v in t['params'].items()}) print() print(" *** data ",type(t['data']),len(t['data'])) print(t['data'].sort_values('val_acc',ascending=False).to_string()) print() distinct_data=t['data'] nunique = distinct_data.apply(pd.Series.nunique) cols_to_drop = nunique[nunique == 1].index distinct_data = distinct_data.drop(cols_to_drop, axis=1) print(nunique,cols_to_drop) print(" *** distinct data ",type(distinct_data),len(distinct_data)) print(distinct_data.sort_values('val_acc',ascending=False).to_string()) print() print(" *** details ",type(t['details']),len(t['details'])) print(t['details']) print() tt = load_object('example.pickle') print(tt['details']) for ttt in tt['round_history']: table = PrettyTable() table.set_style(PLAIN_COLUMNS) iterations=max([len(x) for x in ttt.values()]) table.add_column('epoch',range(1,iterations+1)) for key,val in sorted(ttt.items()): table.add_column(key, sorted(val)) print(table) print_hyperparameter_search_stats(tt)

Running this gets us (among other things) the following table (sorted on val acc):

val
loss
val
acc
loss acc conv
depth
first
neuron
conv
depth
last
neuron
conv
hidden
layers
conv
size
first
neuron
conv
size
last
neuron
first
neuron
hidden
layers
last
neuron
0.357977 0.910370 0.144818 0.95325 60 60 3 7 5 32 1 64
0.304265 0.902963 0.237622 0.92300 60 60 3 5 3 64 3 64
0.317458 0.901296 0.206647 0.93250 60 40 3 5 5 64 2 64
0.407395 0.900926 0.016828 0.99700 60 60 2 5 5 64 1 64
0.359424 0.898519 0.211113 0.93300 40 60 3 5 5 32 2 32
0.374605 0.888889 0.197840 0.93875 60 60 2 7 5 32 2 64
0.400512 0.879444 0.160062 0.94700 20 20 3 7 5 64 1 64
0.430621 0.876111 0.036848 0.99050 40 40 1 5 5 64 1 64
0.442248 0.873889 0.110187 0.96850 60 40 1 7 5 64 3 64
0.458507 0.872037 0.115722 0.96400 60 60 1 7 3 64 3 64
0.458475 0.868704 0.076465 0.97725 60 20 1 7 3 32 1 64
0.513024 0.867963 0.060912 0.98375 40 40 1 7 3 64 2 32
0.425103 0.864630 0.221249 0.92350 20 40 2 7 3 64 3 64
0.451033 0.863889 0.179650 0.94750 20 20 2 7 3 64 1 64
0.469304 0.862222 0.206574 0.93975 60 20 1 7 3 32 3 32
0.453625 0.861852 0.088379 0.97775 40 20 2 7 3 64 1 32
0.444886 0.861481 0.128833 0.96150 60 40 1 7 3 32 1 64
0.507459 0.858148 0.315459 0.90150 40 60 2 5 3 64 2 32
0.486624 0.852963 0.354109 0.89050 20 20 2 5 5 64 1 64
0.485049 0.851111 0.145122 0.96050 20 20 1 5 5 64 1 64
0.559950 0.849815 0.027219 0.99625 20 40 1 7 5 64 1 32
0.494588 0.847963 0.150461 0.95500 20 60 1 5 3 64 1 32
0.548905 0.824444 0.496872 0.83700 20 40 1 5 5 32 2 32
0.613613 0.807037 0.564718 0.80950 60 60 1 7 3 32 2 32
0.626898 0.798519 0.563555 0.81325 60 40 1 7 5 32 2 64
1.310021 0.495741 1.370861 0.46675 40 20 1 7 5 32 3 32
2.302581 0.114630 2.302587 0.09625 40 60 1 7 5 32 2 64
2.302576 0.114630 2.302588 0.10000 60 60 1 5 5 32 3 32
14.327196 0.111111 14.506286 0.10000 60 20 1 7 5 64 1 32
2.302568 0.110556 2.302587 0.09900 20 40 3 5 3 32 3 64

As can be seen from the above table our best result of this run is 91% validation accuracy, only a small improvement over the 90% we got earlier. This can be either because I was lucky with my initial guess of parameters or because we are changing the wrong parameters in the model in our experiment. As this is meant to demonstrate the technique I will not delve deeper into this in this article. Despite this I hope this gives you an overview of how hyperparameter optimization can be realized.

Our new source now looks like this (example.py) and the inspect code like this (example_inspect.py). The generators code is unchanged.

Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License