Hyperparameter optimization with Keras generators and Talos

In my last post about training on multispectral images I was able to improve my classification accuracy on the EuroSat dataset from 75% to 90%, this using a very simple model. While 90% much better there are likelly further improvments possible by improving the model.

When seeking to improve the model there are two overall strategies one might attempt. Either one might try to analyze the problem and try to find a model that nicly adapts to that understanding, this is teoretically nice but the prommise of machine learning is that the computer can try to do some of this work for us. The other approach is to try different combination and see what works. Doing this manually is rather tedious so we can let the computer do this for us. This approach of letting the computer explore different ways of setting up our model is called hyperparameter optimization, where the hyperparameter part is reffereing to things like number of layers, choice of activation functions and the number of nodes per layer, this is to differentiate it from the ordinary parameters which are the weights of the model. The methods of analysis and hyperparameter optimization can of cource be combined, it can be very powerfull to alternate them and using analysis and understanding of the problem to create the broad strokes of the model and then use hyperparameter optimization to find the exact parameter values of that model.

In this article I will extend my earlier code for classifying the EuroSat dataset and show how you can use hyperparameter optimization to help find a better model. I will use a library called Talos to do this but it can be rather easily done without any dependecies as well if there is a reason to.

This article is intended to demonstrate how to write code for hyperparameter optimization, it is not intended to atempt to create an optimal solution to the classification of the EuroSat images, for much better results on that you can read the article EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification by Patrick Helber, Benjamin Bischke, Andreas Dengel and Damian Borth.

To implement hyperparameter optimization we need to define our parameter space and then tell our code to explore it, in our case using the talos library.

In this example we define the parameters with the following python code


params = {
	'epoch': [120],
	'batch_size': [32],
	'activation': [relu],
	# convultion layers in the begining
	'conv_hidden_layers': [1,2,3],
	'conv_depth_shape': ['brick'],
	'conv_size_shape': ['brick'],
	'conv_depth_first_neuron': [20,40,60],
	'conv_depth_last_neuron': [20,40,60],
	'conv_size_first_neuron': [5,7],
	'conv_size_last_neuron': [3,5],
	# fully connected layers at the end
	'first_neuron': [32,64],
	'last_neuron': [32,64],
	'shapes': ['brick'],
	'hidden_layers': [1,2,3],
	'dropout': [0.05]
}

And then pass it to talos like this


verbose=True
round_limit=30 # NOTE Set this to however many rounds you want to test with

model=create_model(training_set,validation_set,verbose)

dummyX,dummyY=training_set.__getitem__(0)
testX,testY=validation_set.__getitem__(0)
validation_set.on_epoch_end()
tt = talos.Scan( x=dummyX
		,y=dummyY
		,params=params
		,model=model
		,x_val=testX
		,y_val=testY
		,experiment_name='example.csv'
		,print_params=True
		,round_limit=round_limit
		)

Note the call to the create_model function. This is where we have moved our keras model definition and it is defined like this


class __Config__(object):
	pass
config=__Config__()
config.optimizer=Adam
config.optimizer_parameters={'lr':0.0001,'decay':0.001}
config.loss='categorical_crossentropy'
config.metric=['accuracy']

def create_model(training_set,validation_set,verbose=False):
	def _create_conv_shape_(params):
		def shape(params):
			if params['hidden_layers']==1:
				return [params['first_neuron']]
			if params['hidden_layers']==2:
				return [params['first_neuron'],params['last_neuron']]
			else:
				params=params.copy()
				params['hidden_layers']-=2
				s_list=network_shape.network_shape(params,params['last_neuron'])
				return [params['first_neuron'],*s_list,params['last_neuron']]
		conv_depth_params={
			'hidden_layers': params['conv_hidden_layers'],
			'shapes': params['conv_depth_shape'],
			'first_neuron': params['conv_depth_first_neuron'],
			'last_neuron': params['conv_depth_last_neuron'],
		}
		conv_size_params={
			'hidden_layers': params['conv_hidden_layers'],
			'shapes': params['conv_size_shape'],
			'first_neuron': params['conv_size_first_neuron'],
			'last_neuron': params['conv_size_last_neuron'],
		}
		
		conv_depth_shape = shape(conv_depth_params)
		conv_size_shape = shape(conv_size_params)
		conv_shape=zip(conv_depth_shape,conv_size_shape)
		
		return conv_shape

	def model(dummyXtrain,dummyYtrain,dummyXval,dummyYval,params):
		conv_shape=_create_conv_shape_(params)
		
		model = Sequential()

		for i,(depth,size) in enumerate(conv_shape):
			if i==0:
				model.add(Conv2D(depth, size, input_shape=training_set.shape))
			else:
				model.add(Conv2D(depth, size))
			model.add(Activation('relu'))
		
		model.add(Flatten())
		
		hidden_layers(model, params, params['last_neuron'])

		model.add(Dense(training_set.num_classes))
		model.add(Activation('softmax'))

		global config
		optimizer=config.optimizer(**config.optimizer_parameters)
		model.compile(loss=config.loss,
			      optimizer=optimizer,
			      metrics=config.metric)
		
		training_set.batch_size=params['batch_size']
		validation_set.batch_size=params['batch_size']
		
		history = model.fit_generator(
			training_set,
			validation_data=validation_set,
			epochs=params['epoch'],
			verbose=int(params['verbose']),
		)
		return history,model
	return model

While this is a bit more complex than our model definition was previously note that half the code is just to allow us to vary the number of of convolutional layers, something that talos at the time of writing this does not automate.

As this is running the training over and over again this code will take quite a while to run (for me it took overnight) I think it is prudent to save the results of the run to a file rather than printing them directly. Doing this allows us to analyze the output in new ways without having to rerun the code. To do this I have defined some helper functions to select the parts of talos output (stored in the tt object) to store and write that to disk. The call looks like this


t = project_object(tt,'params','saved_models','saved_weights','data','details','round_history')
save_object(t,'example.pickle')

And the helper functions are implemented like this


def save_object(obj, filename):
	with open(filename, 'wb') as output:
		pickle.dump(obj, output, protocol=2)

def project_object(obj,*attributes):
	out={}
	for a in attributes:
		out[a]=getattr(obj,a)
	return out

To analyze the output I made another script I called example_inspect.py with the following code

#!/usr/bin/env python3

import pickle
import pandas as pd
from prettytable import PrettyTable, PLAIN_COLUMNS

def save_object(obj, filename):
        with open(filename, 'wb') as output:
                pickle.dump(obj, output, protocol=2)

def load_object(filename):
        with open(filename, 'rb') as f:
                return pickle.load(f)

def print_hyperparameter_search_stats(t):
        print(" *** params: ",{ p:(v if len(v)<200 else [v[0],v[1],v[2],'...',v[-1]]) for p,v in t['params'].items()})
        print()
        print(" *** data ",type(t['data']),len(t['data']))
        print(t['data'].sort_values('val_acc',ascending=False).to_string())
        print()
        distinct_data=t['data']
        nunique = distinct_data.apply(pd.Series.nunique)
        cols_to_drop = nunique[nunique == 1].index
        distinct_data = distinct_data.drop(cols_to_drop, axis=1)
        print(nunique,cols_to_drop)
        print(" *** distinct data ",type(distinct_data),len(distinct_data))
        print(distinct_data.sort_values('val_acc',ascending=False).to_string())
        print()
        print(" *** details ",type(t['details']),len(t['details']))
        print(t['details'])
        print()

tt = load_object('example.pickle')

print(tt['details'])

for ttt in tt['round_history']:
        table = PrettyTable()
        table.set_style(PLAIN_COLUMNS)
        iterations=max([len(x) for x in ttt.values()])
        table.add_column('epoch',range(1,iterations+1))
        for key,val in sorted(ttt.items()):
          table.add_column(key, sorted(val))

        print(table)

print_hyperparameter_search_stats(tt)

Running this gets us (among other things) the following table (sorted on val acc):

val loss	val acc	loss	acc	conv depth first neuron	conv depth last neuron	conv hidden layers	conv size first neuron	conv size last neuron	first neuron	hidden layers	last neuron
0.357977	0.910370	0.144818	0.95325	60	60	3	7	5	32	1	64
0.304265	0.902963	0.237622	0.92300	60	60	3	5	3	64	3	64
0.317458	0.901296	0.206647	0.93250	60	40	3	5	5	64	2	64
0.407395	0.900926	0.016828	0.99700	60	60	2	5	5	64	1	64
0.359424	0.898519	0.211113	0.93300	40	60	3	5	5	32	2	32
0.374605	0.888889	0.197840	0.93875	60	60	2	7	5	32	2	64
0.400512	0.879444	0.160062	0.94700	20	20	3	7	5	64	1	64
0.430621	0.876111	0.036848	0.99050	40	40	1	5	5	64	1	64
0.442248	0.873889	0.110187	0.96850	60	40	1	7	5	64	3	64
0.458507	0.872037	0.115722	0.96400	60	60	1	7	3	64	3	64
0.458475	0.868704	0.076465	0.97725	60	20	1	7	3	32	1	64
0.513024	0.867963	0.060912	0.98375	40	40	1	7	3	64	2	32
0.425103	0.864630	0.221249	0.92350	20	40	2	7	3	64	3	64
0.451033	0.863889	0.179650	0.94750	20	20	2	7	3	64	1	64
0.469304	0.862222	0.206574	0.93975	60	20	1	7	3	32	3	32
0.453625	0.861852	0.088379	0.97775	40	20	2	7	3	64	1	32
0.444886	0.861481	0.128833	0.96150	60	40	1	7	3	32	1	64
0.507459	0.858148	0.315459	0.90150	40	60	2	5	3	64	2	32
0.486624	0.852963	0.354109	0.89050	20	20	2	5	5	64	1	64
0.485049	0.851111	0.145122	0.96050	20	20	1	5	5	64	1	64
0.559950	0.849815	0.027219	0.99625	20	40	1	7	5	64	1	32
0.494588	0.847963	0.150461	0.95500	20	60	1	5	3	64	1	32
0.548905	0.824444	0.496872	0.83700	20	40	1	5	5	32	2	32
0.613613	0.807037	0.564718	0.80950	60	60	1	7	3	32	2	32
0.626898	0.798519	0.563555	0.81325	60	40	1	7	5	32	2	64
1.310021	0.495741	1.370861	0.46675	40	20	1	7	5	32	3	32
2.302581	0.114630	2.302587	0.09625	40	60	1	7	5	32	2	64
2.302576	0.114630	2.302588	0.10000	60	60	1	5	5	32	3	32
14.327196	0.111111 1	4.506286	0.10000	60	20	1	7	5	64	1	32
2.302568	0.110556	2.302587	0.09900	20	40	3	5	3	32	3	64

As can be seen from the above table our best result of this run is 91% validation accuracy, only a small improvement over the 90% we got earlier. This can be either because I was lucky with my initial guess of parameters or because we are changing the wrong parameters in the model in our experiment. As this is meant to demonstrate the technique I will not delve deeper into this in this article. Despite this I hope this gives you an overview of how hyperparameter optimization can be realized.

Our new source now looks like this (example.py) and the inspect code like this (example_inspect.py). The generators code is unchanged.

Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License

Comments