Hyperparameter optimization with Keras generators and Talos

Published: 2019-08-08 11:59

In my last post about training on multispectral images I was able to improve my classification accuracy on the EuroSat dataset from 75% to 90%, this using a very simple model. While 90% much better there are likelly further improvments possible by improving the model.

When seeking to improve the model there are two overall strategies one might attempt. Either one might try to analyze the problem and try to find a model that nicly adapts to that understanding, this is teoretically nice but the prommise of machine learning is that the computer can try to do some of this work for us. The other approach is to try different combination and see what works. Doing this manually is rather tedious so we can let the computer do this for us. This approach of letting the computer explore different ways of setting up our model is called hyperparameter optimization, where the hyperparameter part is reffereing to things like number of layers, choice of activation functions and the number of nodes per layer, this is to differentiate it from the ordinary parameters which are the weights of the model. The methods of analysis and hyperparameter optimization can of cource be combined, it can be very powerfull to alternate them and using analysis and understanding of the problem to create the broad strokes of the model and then use hyperparameter optimization to find the exact parameter values of that model.

In this article I will extend my earlier code for classifying the EuroSat dataset and show how you can use hyperparameter optimization to help find a better model. I will use a library called Talos to do this but it can be rather easily done without any dependecies as well if there is a reason to.

This article is intended to demonstrate how to write code for hyperparameter optimization, it is not intended to atempt to create an optimal solution to the classification of the EuroSat images, for much better results on that you can read the article EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification by Patrick Helber, Benjamin Bischke, Andreas Dengel and Damian Borth.

To implement hyperparameter optimization we need to define our parameter space and then tell our code to explore it, in our case using the talos library.

In this example we define the parameters with the following python code


params = {
	'epoch': [120],
	'batch_size': [32],
	'activation': [relu],
	# convultion layers in the begining
	'conv_hidden_layers': [1,2,3],
	'conv_depth_shape': ['brick'],
	'conv_size_shape': ['brick'],
	'conv_depth_first_neuron': [20,40,60],
	'conv_depth_last_neuron': [20,40,60],
	'conv_size_first_neuron': [5,7],
	'conv_size_last_neuron': [3,5],
	# fully connected layers at the end
	'first_neuron': [32,64],
	'last_neuron': [32,64],
	'shapes': ['brick'],
	'hidden_layers': [1,2,3],
	'dropout': [0.05]
}

And then pass it to talos like this


verbose=True
round_limit=30 # NOTE Set this to however many rounds you want to test with

model=create_model(training_set,validation_set,verbose)

dummyX,dummyY=training_set.__getitem__(0)
testX,testY=validation_set.__getitem__(0)
validation_set.on_epoch_end()
tt = talos.Scan( x=dummyX
		,y=dummyY
		,params=params
		,model=model
		,x_val=testX
		,y_val=testY
		,experiment_name='example.csv'
		,print_params=True
		,round_limit=round_limit
		)

Note the call to the create_model function. This is where we have moved our keras model definition and it is defined like this


class __Config__(object):
	pass
config=__Config__()
config.optimizer=Adam
config.optimizer_parameters={'lr':0.0001,'decay':0.001}
config.loss='categorical_crossentropy'
config.metric=['accuracy']

def create_model(training_set,validation_set,verbose=False):
	def _create_conv_shape_(params):
		def shape(params):
			if params['hidden_layers']==1:
				return [params['first_neuron']]
			if params['hidden_layers']==2:
				return [params['first_neuron'],params['last_neuron']]
			else:
				params=params.copy()
				params['hidden_layers']-=2
				s_list=network_shape.network_shape(params,params['last_neuron'])
				return [params['first_neuron'],*s_list,params['last_neuron']]
		conv_depth_params={
			'hidden_layers': params['conv_hidden_layers'],
			'shapes': params['conv_depth_shape'],
			'first_neuron': params['conv_depth_first_neuron'],
			'last_neuron': params['conv_depth_last_neuron'],
		}
		conv_size_params={
			'hidden_layers': params['conv_hidden_layers'],
			'shapes': params['conv_size_shape'],
			'first_neuron': params['conv_size_first_neuron'],
			'last_neuron': params['conv_size_last_neuron'],
		}
		
		conv_depth_shape = shape(conv_depth_params)
		conv_size_shape = shape(conv_size_params)
		conv_shape=zip(conv_depth_shape,conv_size_shape)
		
		return conv_shape

	def model(dummyXtrain,dummyYtrain,dummyXval,dummyYval,params):
		conv_shape=_create_conv_shape_(params)
		
		model = Sequential()

		for i,(depth,size) in enumerate(conv_shape):
			if i==0:
				model.add(Conv2D(depth, size, input_shape=training_set.shape))
			else:
				model.add(Conv2D(depth, size))
			model.add(Activation('relu'))
		
		model.add(Flatten())
		
		hidden_layers(model, params, params['last_neuron'])

		model.add(Dense(training_set.num_classes))
		model.add(Activation('softmax'))

		global config
		optimizer=config.optimizer(**config.optimizer_parameters)
		model.compile(loss=config.loss,
			      optimizer=optimizer,
			      metrics=config.metric)
		
		training_set.batch_size=params['batch_size']
		validation_set.batch_size=params['batch_size']
		
		history = model.fit_generator(
			training_set,
			validation_data=validation_set,
			epochs=params['epoch'],
			verbose=int(params['verbose']),
		)
		return history,model
	return model

While this is a bit more complex than our model definition was previously note that half the code is just to allow us to vary the number of of convolutional layers, something that talos at the time of writing this does not automate.

As this is running the training over and over again this code will take quite a while to run (for me it took overnight) I think it is prudent to save the results of the run to a file rather than printing them directly. Doing this allows us to analyze the output in new ways without having to rerun the code. To do this I have defined some helper functions to select the parts of talos output (stored in the tt object) to store and write that to disk. The call looks like this


t = project_object(tt,'params','saved_models','saved_weights','data','details','round_history')
save_object(t,'example.pickle')

And the helper functions are implemented like this


def save_object(obj, filename):
	with open(filename, 'wb') as output:
		pickle.dump(obj, output, protocol=2)

def project_object(obj,*attributes):
	out={}
	for a in attributes:
		out[a]=getattr(obj,a)
	return out

To analyze the output I made another script I called example_inspect.py with the following code

#!/usr/bin/env python3

import pickle
import pandas as pd
from prettytable import PrettyTable, PLAIN_COLUMNS

def save_object(obj, filename):
        with open(filename, 'wb') as output:
                pickle.dump(obj, output, protocol=2)

def load_object(filename):
        with open(filename, 'rb') as f:
                return pickle.load(f)

def print_hyperparameter_search_stats(t):
        print(" *** params: ",{ p:(v if len(v)<200 else [v[0],v[1],v[2],'...',v[-1]]) for p,v in t['params'].items()})
        print()
        print(" *** data ",type(t['data']),len(t['data']))
        print(t['data'].sort_values('val_acc',ascending=False).to_string())
        print()
        distinct_data=t['data']
        nunique = distinct_data.apply(pd.Series.nunique)
        cols_to_drop = nunique[nunique == 1].index
        distinct_data = distinct_data.drop(cols_to_drop, axis=1)
        print(nunique,cols_to_drop)
        print(" *** distinct data ",type(distinct_data),len(distinct_data))
        print(distinct_data.sort_values('val_acc',ascending=False).to_string())
        print()
        print(" *** details ",type(t['details']),len(t['details']))
        print(t['details'])
        print()

tt = load_object('example.pickle')

print(tt['details'])

for ttt in tt['round_history']:
        table = PrettyTable()
        table.set_style(PLAIN_COLUMNS)
        iterations=max([len(x) for x in ttt.values()])
        table.add_column('epoch',range(1,iterations+1))
        for key,val in sorted(ttt.items()):
          table.add_column(key, sorted(val))

        print(table)

print_hyperparameter_search_stats(tt)

Running this gets us (among other things) the following table (sorted on val acc):

val loss	val acc	loss	acc	conv depth first neuron	conv depth last neuron	conv hidden layers	conv size first neuron	conv size last neuron	first neuron	hidden layers	last neuron
0.357977	0.910370	0.144818	0.95325	60	60	3	7	5	32	1	64
0.304265	0.902963	0.237622	0.92300	60	60	3	5	3	64	3	64
0.317458	0.901296	0.206647	0.93250	60	40	3	5	5	64	2	64
0.407395	0.900926	0.016828	0.99700	60	60	2	5	5	64	1	64
0.359424	0.898519	0.211113	0.93300	40	60	3	5	5	32	2	32
0.374605	0.888889	0.197840	0.93875	60	60	2	7	5	32	2	64
0.400512	0.879444	0.160062	0.94700	20	20	3	7	5	64	1	64
0.430621	0.876111	0.036848	0.99050	40	40	1	5	5	64	1	64
0.442248	0.873889	0.110187	0.96850	60	40	1	7	5	64	3	64
0.458507	0.872037	0.115722	0.96400	60	60	1	7	3	64	3	64
0.458475	0.868704	0.076465	0.97725	60	20	1	7	3	32	1	64
0.513024	0.867963	0.060912	0.98375	40	40	1	7	3	64	2	32
0.425103	0.864630	0.221249	0.92350	20	40	2	7	3	64	3	64
0.451033	0.863889	0.179650	0.94750	20	20	2	7	3	64	1	64
0.469304	0.862222	0.206574	0.93975	60	20	1	7	3	32	3	32
0.453625	0.861852	0.088379	0.97775	40	20	2	7	3	64	1	32
0.444886	0.861481	0.128833	0.96150	60	40	1	7	3	32	1	64
0.507459	0.858148	0.315459	0.90150	40	60	2	5	3	64	2	32
0.486624	0.852963	0.354109	0.89050	20	20	2	5	5	64	1	64
0.485049	0.851111	0.145122	0.96050	20	20	1	5	5	64	1	64
0.559950	0.849815	0.027219	0.99625	20	40	1	7	5	64	1	32
0.494588	0.847963	0.150461	0.95500	20	60	1	5	3	64	1	32
0.548905	0.824444	0.496872	0.83700	20	40	1	5	5	32	2	32
0.613613	0.807037	0.564718	0.80950	60	60	1	7	3	32	2	32
0.626898	0.798519	0.563555	0.81325	60	40	1	7	5	32	2	64
1.310021	0.495741	1.370861	0.46675	40	20	1	7	5	32	3	32
2.302581	0.114630	2.302587	0.09625	40	60	1	7	5	32	2	64
2.302576	0.114630	2.302588	0.10000	60	60	1	5	5	32	3	32
14.327196	0.111111 1	4.506286	0.10000	60	20	1	7	5	64	1	32
2.302568	0.110556	2.302587	0.09900	20	40	3	5	3	32	3	64

As can be seen from the above table our best result of this run is 91% validation accuracy, only a small improvement over the 90% we got earlier. This can be either because I was lucky with my initial guess of parameters or because we are changing the wrong parameters in the model in our experiment. As this is meant to demonstrate the technique I will not delve deeper into this in this article. Despite this I hope this gives you an overview of how hyperparameter optimization can be realized.

Our new source now looks like this (example.py) and the inspect code like this (example_inspect.py). The generators code is unchanged.

Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License

Training on multispectral images using Keras

Published: 2019-03-27 13:27

Edited: 2019-05-03 14:13

In my recent post about using Keras generators I was able to achive 75% classification accuracy on the EuroSat dataset using a very simple model. While there is a lot to do in regards to improving the model there is a simple change that can be made without the need for the analysis work needed for an improved model.

In my generators post I elected to use the JPEG variant of the dataset, for reasons of not introducing to many new concepts into that post. Alternatively what can be done is to use the multispectral TIFF images from the dataset, thus gaining access to much more information for the machine learning to base its conclusions on.

This turned out to be a relatively simple thing to do which surprised me as very little informations on this was available online, I mostly found blog posts of people asking how to get it working.

Starting with the code in my post on generators (generators.py, example.py) we can make a simply replace the read_image function and the code will be able to process multispectral images. Code below


import numpy as np
import rasterio

read_image_cache={}
def read_image(path, rescale=None):
        key="{},{}".format(path,rescale)
        if key in read_image_cache:
                return read_image_cache[key]
        else:
                with rasterio.open(path) as img:
                        data=img.read()
                        data=np.moveaxis(data,0,-1)
                        if rescale!=None:
                                data=data*rescale
                        read_image_cache[key]=data
                        return data

What this code does is that it stops using the Keras load_img function and instead uses the Rasterio library to directly read images to numpy arrays. This function will return a 3D array with a depth equal to the number of bands in the image.

Making that change and running the same test as in the generators post we get the following results


Epoch 1/120
125/125 [==========] - 49s 394ms/step - loss: 2.7323 - acc: 0.2477 - val_loss: 1.7014 - val_acc: 0.3857
Epoch 2/120
125/125 [==========] - 49s 392ms/step - loss: 1.3841 - acc: 0.4800 - val_loss: 1.2359 - val_acc: 0.5559
Epoch 3/120
125/125 [==========] - 49s 393ms/step - loss: 1.0834 - acc: 0.5998 - val_loss: 1.1012 - val_acc: 0.5928
Epoch 4/120
125/125 [==========] - 49s 392ms/step - loss: 0.8800 - acc: 0.6778 - val_loss: 0.8057 - val_acc: 0.7107
Epoch 5/120
125/125 [==========] - 49s 393ms/step - loss: 0.7929 - acc: 0.7115 - val_loss: 0.7359 - val_acc: 0.7394
Epoch 6/120
125/125 [==========] - 49s 392ms/step - loss: 0.7211 - acc: 0.7380 - val_loss: 0.7304 - val_acc: 0.7544
Epoch 7/120
125/125 [==========] - 49s 393ms/step - loss: 0.6667 - acc: 0.7578 - val_loss: 0.7604 - val_acc: 0.7031
Epoch 8/120
125/125 [==========] - 49s 393ms/step - loss: 0.6208 - acc: 0.7830 - val_loss: 0.6004 - val_acc: 0.7833
Epoch 9/120
125/125 [==========] - 49s 392ms/step - loss: 0.6095 - acc: 0.7867 - val_loss: 0.6019 - val_acc: 0.7885
Epoch 10/120
125/125 [==========] - 49s 393ms/step - loss: 0.5913 - acc: 0.7905 - val_loss: 0.5670 - val_acc: 0.7961

...

Epoch 90/120
125/125 [==========] - 48s 384ms/step - loss: 0.2038 - acc: 0.9375 - val_loss: 0.3243 - val_acc: 0.8854
Epoch 91/120
125/125 [==========] - 48s 382ms/step - loss: 0.2064 - acc: 0.9315 - val_loss: 0.3140 - val_acc: 0.8943
Epoch 92/120
125/125 [==========] - 48s 384ms/step - loss: 0.2059 - acc: 0.9325 - val_loss: 0.3232 - val_acc: 0.8870
Epoch 93/120
125/125 [==========] - 48s 382ms/step - loss: 0.1994 - acc: 0.9345 - val_loss: 0.3165 - val_acc: 0.8900
Epoch 94/120
125/125 [==========] - 48s 382ms/step - loss: 0.2030 - acc: 0.9375 - val_loss: 0.3013 - val_acc: 0.8970
Epoch 95/120
125/125 [==========] - 48s 381ms/step - loss: 0.1952 - acc: 0.9400 - val_loss: 0.3164 - val_acc: 0.8917
Epoch 96/120
125/125 [==========] - 48s 381ms/step - loss: 0.1961 - acc: 0.9380 - val_loss: 0.3295 - val_acc: 0.8878
Epoch 97/120
125/125 [==========] - 48s 381ms/step - loss: 0.2003 - acc: 0.9387 - val_loss: 0.3145 - val_acc: 0.8920
Epoch 98/120
125/125 [==========] - 48s 381ms/step - loss: 0.1886 - acc: 0.9400 - val_loss: 0.3096 - val_acc: 0.8926
Epoch 99/120
125/125 [==========] - 48s 381ms/step - loss: 0.1983 - acc: 0.9323 - val_loss: 0.3287 - val_acc: 0.8907
Epoch 100/120
125/125 [==========] - 48s 380ms/step - loss: 0.1923 - acc: 0.9338 - val_loss: 0.3190 - val_acc: 0.8887
Epoch 101/120
125/125 [==========] - 48s 382ms/step - loss: 0.1927 - acc: 0.9313 - val_loss: 0.3107 - val_acc: 0.8957
Epoch 102/120
125/125 [==========] - 47s 376ms/step - loss: 0.1788 - acc: 0.9375 - val_loss: 0.3131 - val_acc: 0.8941
Epoch 103/120
125/125 [==========] - 47s 377ms/step - loss: 0.1932 - acc: 0.9370 - val_loss: 0.3008 - val_acc: 0.8978
Epoch 104/120
125/125 [==========] - 48s 380ms/step - loss: 0.1894 - acc: 0.9405 - val_loss: 0.3049 - val_acc: 0.9019
Epoch 105/120
125/125 [==========] - 47s 377ms/step - loss: 0.1821 - acc: 0.9420 - val_loss: 0.3138 - val_acc: 0.8915
Epoch 106/120
125/125 [==========] - 47s 379ms/step - loss: 0.1811 - acc: 0.9400 - val_loss: 0.3159 - val_acc: 0.8924
Epoch 107/120
125/125 [==========] - 47s 375ms/step - loss: 0.1797 - acc: 0.9400 - val_loss: 0.3079 - val_acc: 0.8972
Epoch 108/120
125/125 [==========] - 47s 378ms/step - loss: 0.1826 - acc: 0.9382 - val_loss: 0.3215 - val_acc: 0.8935
Epoch 109/120
125/125 [==========] - 47s 378ms/step - loss: 0.1798 - acc: 0.9393 - val_loss: 0.3031 - val_acc: 0.8972
Epoch 110/120
125/125 [==========] - 47s 376ms/step - loss: 0.1763 - acc: 0.9455 - val_loss: 0.3588 - val_acc: 0.8776
Epoch 111/120
125/125 [==========] - 47s 379ms/step - loss: 0.1723 - acc: 0.9445 - val_loss: 0.3039 - val_acc: 0.8965
Epoch 112/120
125/125 [==========] - 47s 376ms/step - loss: 0.1822 - acc: 0.9407 - val_loss: 0.3099 - val_acc: 0.8978
Epoch 113/120
125/125 [==========] - 47s 378ms/step - loss: 0.1831 - acc: 0.9412 - val_loss: 0.3140 - val_acc: 0.8917
Epoch 114/120
125/125 [==========] - 47s 376ms/step - loss: 0.1674 - acc: 0.9455 - val_loss: 0.3166 - val_acc: 0.8898
Epoch 115/120
125/125 [==========] - 48s 381ms/step - loss: 0.1734 - acc: 0.9475 - val_loss: 0.3126 - val_acc: 0.8965
Epoch 116/120
125/125 [==========] - 47s 377ms/step - loss: 0.1677 - acc: 0.9430 - val_loss: 0.3025 - val_acc: 0.8954
Epoch 117/120
125/125 [==========] - 47s 377ms/step - loss: 0.1788 - acc: 0.9463 - val_loss: 0.3092 - val_acc: 0.8920
Epoch 118/120
125/125 [==========] - 47s 377ms/step - loss: 0.1622 - acc: 0.9472 - val_loss: 0.2990 - val_acc: 0.9004
Epoch 119/120
125/125 [==========] - 47s 376ms/step - loss: 0.1629 - acc: 0.9465 - val_loss: 0.3225 - val_acc: 0.8900
Epoch 120/120
125/125 [==========] - 47s 378ms/step - loss: 0.1800 - acc: 0.9397 - val_loss: 0.3025 - val_acc: 0.8981

As you can see from the program output we are getting a much better result of approximately 90%. We can also see that from about epoch 100 we are mostly oscillating around this value, this tells us that we are likely at the limit for how good our simple model can become, necessitating a more thought out one for better results (potentially more and or better training data might also be needed). We could keep running for more epochs but that would most likely lead to over training.

Our new source now looks like this (example.py). The generators code is unchanged.

Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License

Utilizing generators to use Keras training with existing file structure

Published: 2019-03-27 10:43

I recently wanted to use Keras, a deep learning framework, to solve an image classification problem and ran into an issue. Keras built-in image load functions assumes that my training data is organized in a single folder with a subfolder for each class of images. This is then replicated for the validation data unless Keras automatic validation split is used. In my case the data where spread out over several folders (an artifact from how the data was sourced) and it would be impractical to copy the data which were already taking up a significant part of the total disk space in the development system.

The solution to this is to use Keras generators. There are two kinds of generators in Keras, either a simple python generator using yield or a class inheriting from keras.utils.Sequence. The later one is the more flexible one and what this post focuses on.

My initial attempt did work but was rather messy to use and when I needed to extend it to handle splitting the data into three parts (test,validation and training) doing that in the original design would have been very messy. So I took a step back and figured that I wanted the following operations.

create empty generator
add a directory with files to the generator
this could be extended to add data from other sources or directory structures
shuffle the data
split the generator into new generators using a list of split-points (real number between 0 and 1)
a way to get the class names of the generator
a way to get the filename of images yielded by the generator

Of these, the key operations are the splitting and mapping of generated images to filenames. The splitting is important as it allows us to control how many and how large sets we are splitting our data into, allowing for training, validation and test sets or more. The mapping of images back to filenames are important as it allows us to use the generators for prediction as well as allowing us to generate lists of images which the network gets wrong for manual analysis of the networks behaviour.

In addition to this we have some additional operations included later as their need became apparent.

A function to set constructor properties after the fact, such as verbosity
A function to preload the images into a cache
Controls for the batch size used
Controls for restricting the maximum number of images per class each epoch

While not central to the functioning of the generator these functionalities proved needed in practical application.

To create a generator based on keras.utils.Sequence we are required to provide a few methods to get it to work.


class SplitSetImageGenerator(keras.utils.Sequence):
	def __getitem__(self,index): # gets the batch for the supplied index
		# return a tuple (numpy array of image, numpy array of labels) or None at epoch end
	def __len__(self): # gets the number of batches
		# return the number of batches in this epoch (do not change in the middle of an epoch)
	def on_epoch_end(self): # performs auto shuffle if enabled
		# Do what we need to do between epochs

Adding our methods we arrive at


class SplitSetImageGenerator(keras.utils.Sequence):
	def __init__(self):
		# do initialization
	def set(self,**attributes):
		# set some config property, eg batch_size, verbose or max_per_class_and_epoch
	def add_dir(self,image_dir_reader,*paths):
		# add the directories in paths to this generator as image sources
		# image_dir_reader should be a function returning a tuple of lists:
		# 	names        - filenames of images
		#	classes      - class of each image as a number
		#	classnames   - names of all the classes in the directory
		#	classindices - companion list to classnames mapping each name to its number
	def shuffle(self):
		# shuffle the contents without loosing filename associations
	def preload(self):
		# load all images which will cache them if caching is configured
	def split(self,*splitpoints):
		# splits the generator at the provided fractions of all images, duplicate fractions
		# generates empty child generators and non increasing fractions is disallowed
	def get_filenames(self,indices):
		# returns the filenames of the images corresponding to the indices in the current epoch 
	def __getitem__(self,index): # gets the batch for the supplied index
		# return a tuple (numpy array of image, numpy array of labels) or None at epoch end
	def __len__(self): # gets the number of batches
		# return the number of batches in this epoch (do not change in the middle of an epoch)
	def on_epoch_end(self): # performs auto shuffle if enabled
		# Do what we need to do between epochs

When we have these methods we are starting to be able to write useful code. If we adopt the convention that all methods except split, get_filenames and the methods from keras.utils.Sequence will return self we can now do.


	training,validation=SplitSetImageGenerator().add_dir(*paths).shuffle().preload().split(0.8)
	model.fit_generator(training,validation_data=validation,epochs=10)

Once we have this in place we will not add any more external methods, we will however define some useful properties that the generator will have defined that a user of the generator can access. The primary ones are:

filenames - a list of all filenames known to the generator
classes - a corresponding list of class numbers for each filename
classnames - a list where class names can be looked up from class numbers

These are the ones most useful to access. Some further properties we will define, mostly to configure the behaviour of the generator (using __init__ or the set method) are:

batch_size - the number of images returned on each call of __getitem__
verbose - to spam or not to spam stdout
max_per_class_and_epoch - a limit on how many images of each class to return
auto_shuffle - if the generator should be shuffled between epochs
scale - a number to scale all pixel values in an image with
image_load_function - a function that can load an image into a numpy array
image_cache - a cache object that can be passed to the image load function

I think most of these are rather obvious, the one I want to comment on are the max_per_class_and_epoch. I added that one after I got problems with the training, turned out that I had many more examples of one of my classes so the training got stuck in a local maxima where it always predicted that class. This option solved that by ensuring that in each epoch the generator will always produce the same number of each class as long as its value is set lower than the number of images in the smallest class in the training set.

I will not go through the implementation in detail, if you are interested you can look at the source yourself. I will however show some examples of how to use the code.

To use the generator some steps are needed and other are probably recommended. The following example shows how to read images from a folder in the same manner as Keras built in image data generator and then split that dataset in a consistent way. I will be using the EuroSAT dataset available at https://github.com/phelber/eurosat in this example.


# build the data generators
test_validation_train_split=[0.2,0.4]
test_set,validation_set,training_set=[dataset.set(verbose=False) for dataset
		in SplitSetImageGenerator(image_load_function=read_image,scale=1.0/255)
				.add_dir(image_data_generator_dir_reader,'data/EuroSat/jpg/')
				.shuffle()
				.split(*test_validation_train_split)]

# preload images to speed up training
for s in [validation_set,training_set]:
	s.set(verbose=True).preload().set(verbose=False).shuffle()

As can be seen from the code we start by creating the image generator and passing it an image load function (to be defined later) and a scale factor (here used to scale pixels into the range 0-1). We then add a directory with data to the generator by passing a reader function (to be defined) as well as a path to a directory of images. At this point we have a generator capable of being used in training etc.

In the next step we shuffle the generator to avoid the risk of all the images of some classes ending up in the same part of the data when we split into test, training and validation sets. We follow the shuffle by splitting the data placing the data in the range of 0%-20% into the first set, 20%-40% into the next set and 40%-100% into the last set. We then disables verbosity for all sets and store the sets as test, validation and training.

The final step is preloading the images in the validation and training set to avoid slowdowns caused by disk access during training.

To make this work we need to define the functions for reading a directory and for reading the individual image files. We do that using the following code.


read_image_cache={}
def read_image(path, rescale=None):
	key="{},{}".format(path,rescale)
	if key in read_image_cache:
		return read_image_cache[key]
	else:
		img=image.load_img(path)
		data=image.img_to_array(img)
		if rescale!=None:
			data=data*rescale
		read_image_cache[key]=data
		return data

# function to return filenames and classes of images
# also returns a list of class names and a list of class indices corresponding to the class names
def image_data_generator_dir_reader(path):
	sys.stdout=sys.stderr # redirect problematic output
	# here we use the keras ImageDataGenerator to get a list of filenames and classes
	ig = image.ImageDataGenerator()
	gen = ig.flow_from_directory(path)
	sys.stdout=sys.__stdout__ # restore stdout
	names=[os.path.normpath(path+'/'+n.replace('\\','/')).replace('\\','/') for n in gen.filenames]
	return (names,gen.classes,*zip(*gen.class_indices.items()))

The first of these functions reads a single image using Keras load_img function and caches it as well as applying any supplied rescaling.

The second function uses the Keras ImageDataGenerator to get filenames and classes for them from a directory. If the data is stored in some other organisation than the one handled by Keras ImageDataGenerator we only need to supply a function of this type that can read that format to add_dir and we can keep using the rest of the code unchanged and without needing to reorganize data on disk. Also as were the original motivation we are not restricted to one call to add_dir but can add many directories if we have several datasets we want to combine.

Having read the data we can then define a simple model and train a network using this code. (full source here example.py)


################### MODEL DEFINITION ###################
		
# this is not an optimized model, just a simple example
# for good results this model needs some thought

model = Sequential()

model.add(Conv2D(60, 5, input_shape=training_set.shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(20, 5, input_shape=training_set.shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(120))
model.add(Dense(60))
model.add(Dense(training_set.num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
	      optimizer=Adam(lr=0.0001,decay=0.001),
	      metrics=['accuracy'])

################### TRAINING ###################

history = model.fit_generator(
	training_set,
	validation_data=validation_set,
	epochs=30
)

Running this produces output as follows and as we can see we are with this basic model still reaching validation accuracies of 75%.


Epoch 1/120
125/125 [===========] - 39s 311ms/step - loss: 2.3054 - acc: 0.1133 - val_loss: 2.2371 - val_acc: 0.1220
Epoch 2/120
125/125 [===========] - 39s 312ms/step - loss: 2.0607 - acc: 0.1675 - val_loss: 1.8251 - val_acc: 0.2598
Epoch 3/120
125/125 [===========] - 39s 311ms/step - loss: 1.6701 - acc: 0.3575 - val_loss: 1.5512 - val_acc: 0.4070
Epoch 4/120
125/125 [===========] - 39s 310ms/step - loss: 1.4910 - acc: 0.4300 - val_loss: 1.4137 - val_acc: 0.4667
Epoch 5/120
125/125 [===========] - 39s 312ms/step - loss: 1.4091 - acc: 0.4610 - val_loss: 1.3520 - val_acc: 0.5046
Epoch 6/120
125/125 [===========] - 40s 318ms/step - loss: 1.3175 - acc: 0.5067 - val_loss: 1.3070 - val_acc: 0.4922
Epoch 7/120
125/125 [===========] - 39s 314ms/step - loss: 1.3077 - acc: 0.5090 - val_loss: 1.3010 - val_acc: 0.4761
Epoch 8/120
125/125 [===========] - 39s 315ms/step - loss: 1.2480 - acc: 0.5327 - val_loss: 1.2288 - val_acc: 0.5443
Epoch 9/120
125/125 [===========] - 39s 311ms/step - loss: 1.2073 - acc: 0.5588 - val_loss: 1.2555 - val_acc: 0.5157
Epoch 10/120
125/125 [===========] - 39s 310ms/step - loss: 1.2273 - acc: 0.5618 - val_loss: 1.1627 - val_acc: 0.5794

...

Epoch 110/120
125/125 [===========] - 31s 249ms/step - loss: 0.7036 - acc: 0.7505 - val_loss: 0.7085 - val_acc: 0.7424
Epoch 111/120
125/125 [===========] - 31s 248ms/step - loss: 0.7144 - acc: 0.7535 - val_loss: 0.7177 - val_acc: 0.7413
Epoch 112/120
125/125 [===========] - 31s 249ms/step - loss: 0.7088 - acc: 0.7630 - val_loss: 0.7053 - val_acc: 0.7535
Epoch 113/120
125/125 [===========] - 31s 249ms/step - loss: 0.6910 - acc: 0.7620 - val_loss: 0.6994 - val_acc: 0.7513
Epoch 114/120
125/125 [===========] - 31s 249ms/step - loss: 0.7053 - acc: 0.7518 - val_loss: 0.6969 - val_acc: 0.7531
Epoch 115/120
125/125 [===========] - 31s 249ms/step - loss: 0.6863 - acc: 0.7655 - val_loss: 0.6980 - val_acc: 0.7544
Epoch 116/120
125/125 [===========] - 31s 248ms/step - loss: 0.6859 - acc: 0.7600 - val_loss: 0.7182 - val_acc: 0.7433
Epoch 117/120
125/125 [===========] - 31s 249ms/step - loss: 0.7222 - acc: 0.7460 - val_loss: 0.6948 - val_acc: 0.7528
Epoch 118/120
125/125 [===========] - 31s 248ms/step - loss: 0.7032 - acc: 0.7602 - val_loss: 0.7140 - val_acc: 0.7444
Epoch 119/120
125/125 [===========] - 31s 248ms/step - loss: 0.6917 - acc: 0.7615 - val_loss: 0.6946 - val_acc: 0.7496
Epoch 120/120
125/125 [===========] - 31s 247ms/step - loss: 0.6862 - acc: 0.7562 - val_loss: 0.6945 - val_acc: 0.7502

That's all for this post, I hope to write more about machine learning in the feature if I do you should be able to find them using the tags on this post.

All source code for this post

generator: generators.py
example: example.py

Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License

Predicate combinators in input validation

Published: 2019-02-11 13:23

I have spent some time coding in python and ran accross the problem of parsing' comand line parameteras and validating them. The python argparse library proved to be great at parsing but at first glance did not provide obvious means for validating the parameters.

It turns out however that there is a feature in the argparse library we can exploit to easily add that.

Consider the following example from the argparse documentation:


import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
		    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
		    const=sum, default=max, help='sum the integers (default: find the max)')

args = parser.parse_args()
print(args.accumulate(args.integers))

Consider especially the parameter type=int to the first argument.

On the surface what this does is passing a function to convert the argument to the desired type which was not what I was trying to do. But considering that this is an arbitrary function call this is a great place for injecting my own validation code.

The naive approach would write a validation function for each parameter and then passing it to the type argument and then do the checks there. This is quite good but it can be improved.

Consider this function


def create_checked(predicate,error="ERROR: value does not meet constraints"):
       def fun(value):
               if predicate(value):
                       return value
               raise Exception(error)
       return fun

which is then used like this


parser.add_argument('integers', metavar='N', type=create_checked(int), nargs='+',
				    help='an integer for the accumulator')

In this example the situation is actually not improving, we are checking if the argument can be converted to string and if a false value is returned we are trowing an exception. This is a bit redundant as the int function already throws.

But lets consider another example


parser.add_argument('config', metavar='FILENAME', type=create_checked(os.path.isfile), help='a config file')

This is more useful, now we can use standard function to check if input files exists and any other function that already exists and can return a boolean based on a string like a pathname (os.path.isdir comes to mind).

We are still not done thou.

What if we want to check more than one thing about a parameter, or we want to check that the file does not exists.

Enter the following functions


def And(*predicates):
       def inner(obj):
               for p in predicates:
                       if not p(obj):
                               return False
               return True
       return inner

def Or(*predicates):
       def inner(obj):
               for p in predicates:
                       if p(obj):
                               return True
               return False
       return inner

def Not(predicate):
       def inner(obj):
               return not predicate(obj)
       return inner

These functions allows us to combine several things to check regarding the same parameter or to negate the value of a checking function.


parser.add_argument('output', metavar='FILENAME', type=create_checked(Not(os.path.isfile)), help='an output file')
parser.add_argument('input-zip', metavar='FILENAME', type=create_checked(And(os.path.isfile,valid_zip)), help='an input zip file')

The code above first checks that the output file does not exists and then for the input parameter checks that the file exists and is a valid zip file (assuming the valid_zip function exists).

That's all for this post, hope you find it useful

Source code for this: create_checked.py

Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License

Using decorators to emulate ad-hoc inheritance in Java

Published: 2019-02-08 14:44

Consider Java code that is based on the rather common pattern of factories, producing instances of some interface you want to use. In the simple case this works fine but what do you do when you want to modify a method in the returned instance? If there where no factory involved you could just inherit from the class anonymously and override that method. But given that the object is created via a factory that option is closed to us.

Luckily there is a solution for us in the Java API, the java.lang.reflect.Proxy classes newProxyInstance method. This however have a rather clunky interface where we are expected to handle Method objects and InvocationHandler instances.

We can do better than that.

After a bit of experimenting I came up with the following API

public class Decorator { @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.METHOD) public static @interface Override {} public static <I, T extends I,D> I decorate(T proxyBase, Class<I> asInterface, D decorations) throws NoSuchMethodException { // ... Implementation } }

Which is used like this


MyInterface m=Decorator.decorate(MyInterfaceFactory.create(),
				 MyInterface.class,
				 new Object(){
					@Decorator.Override
					void close(MyInterface i){
						System.err.println("Closing down");
						i.close();
					}
				 });
m.close(); // Will print "Closing down"

source code: Decorator.java
test case: Test.java

Enjoy

Feel free to use this code for any and all purposes, consider it in the public domain or if that is not workable for you you can use it under the terms of the MIT License