In my previous post, I showed the support functions necessary to read and save the EMNIST handwriting recognition datasets for use in deep machine learning programs. In this installment I’ll cover the entire working program, line by line, with the result that you’ll be able to visualize individual characters from the testing database, like this:
We can all see that’s a numeral three. Let’s see how we got here.
The Imports
There are numerous imports needed to make this work. We need some tools from keras and Tensorflow of course, and we need some numpy modules to save and restore the arrays from disk. We also need the matplotlib package to visualize the character as shown above.
from keras.models import Sequential from keras.layers import * import numpy as np import tensorflow as tf from numpy import argmax import matplotlib.pyplot as plt
Reading the File Headers
In addition to what we did in the earlier installment of this article, we’ll also be adding the function to read the “labels” file header. This label file contains one byte per sample that represents the “right answer” as to what its corresponding digit in the image dataset it. So for example, if the image at index 2002 is a “3” then the image file will have 0x03 at index 2002. But as in the last installment, the main reason to read the headers is to gather sample metadata and return that and the starting position of the data in a tuple for use in the main program.
def read_image_file_header(filename): f = open(filename, 'rb') int.from_bytes(f.read(4), byteorder='big') # magic number, discard it count = int.from_bytes(f.read(4), byteorder='big') # number of samples in data set rows = int.from_bytes(f.read(4), byteorder='big') # rows per image columns = int.from_bytes(f.read(4), byteorder='big') # columns per image pos = f.tell() # current position used as offset later when reading data f.close() return pos, count, rows, columns def read_label_file_header(filename): f = open(filename, 'rb') int.from_bytes(f.read(4), byteorder='big') # magic number int.from_bytes(f.read(4), byteorder='big') # sample count pos = f.tell() f.close() return pos
Reading the Images and Labels
These matching functions, again for images and their corresponding labels, read the EMNIST dataset as described in the earlier article and save the resulting numpy arrays to disk. This allows them to be imported directly into keras. You’ll see that the method of reading is quite similar, except the image reader reads and constructs a 28×28 array, while the label readers reads a single byte into a 1D list.
def load_save_images(inputfilename, byte_offset, outputfilename, cols, rows, count): list_data = [] infile = open(inputfilename, 'rb') infile.seek(byte_offset) for n in range(count): image_matrix = [[0 for x in range(cols)] for y in range(rows)] for r in range(rows): for c in range(cols): byte = infile.read(1) image_matrix[c][r] = float(ord(byte)) list_data.append(image_matrix) # show progress if n % 5000 == 0: print("... " + str(n)) infile.close() print('converting to numpy array') list_data = np.array(list_data, dtype=np.float32) print('normalizing') list_data = tf.keras.utils.normalize(list_data, axis=1) print('saving') np.save(outputfilename, list_data) def load_save_labels(inputfilename, offset, outputfilename, count): list_data = [] infile = open(inputfilename, 'rb') infile.seek(offset) for n in range(count): byte = infile.read(1) list_data.append(ord(byte)) # show progress if n % 5000 == 0: print("... " + str(n)) infile.close() print('converting to numpy array') list_data = np.array(list_data, dtype=np.uint8) print('saving') np.save(outputfilename, list_data)
One improvement over the version posted in the previous article is
list_data = np.array(list_data, dtype=np.float32) and list_data = np.array(list_data, dtype=np.uint8)
Specifying the data type to use as a smaller type (using the dtype
keyword argument) results in arrays that are quite a bit smaller when saved on disk, and which perform just as well for this class of problem.
The Main Program
That’s it for the support def()
methods. Next we begin on the main program. Because some of these operations are lengthy, I first define a few boolean variables to allow us to “turn off” processing of certain sections once they run successfully. This allows us to save time and disk activity for data that, once computed, will remain static unless the program is changed.
reReadAllData = True reTrain = True reEvaluate = True doPredictions = True
Reading and Saving the Datasets
This is where the most time-consuming code is, as we read each of the datasets and their label files byte by byte and the construct, normalize (for the image data), and save the arrays out to disk. This block is guarded by the boolean reReadAllData
so that once done successfully, this variable can be set to false, saving time.
if reReadAllData: input_filename = 'emnist-digits-train-images-idx3-ubyte' offset, sample_count, rows_per_image, columns_per_image = read_image_file_header(input_filename) load_save_images(input_filename, offset, 'images_train_array', columns_per_image, rows_per_image, sample_count) input_filename = 'emnist-digits-train-labels-idx1-ubyte' offset = read_label_file_header(input_filename) load_save_labels(input_filename, offset, 'labels_train_array', sample_count) input_filename = 'emnist-digits-test-images-idx3-ubyte' offset, sample_count, rows_per_image, columns_per_image = read_image_file_header(input_filename) load_save_images(input_filename, offset, 'images_test_array', columns_per_image, rows_per_image, sample_count) input_filename = 'emnist-digits-test-labels-idx1-ubyte' offset = read_label_file_header(input_filename) load_save_labels(input_filename, offset, 'labels_test_array', sample_count)
Setting Up the Model
Now we instantiate the keras model and if reTrain
is True
, we load the training data from disk, configure the model, compile, and fit the data. Finally, in case this works to our satisfaction, we save the keras/Tensorflow model to disk so that we don’t need to do this again.
model = Sequential() if reTrain: print('training model') x_train = np.load('images_train_array.npy') y_train = np.load('labels_train_array.npy') model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dense(128, activation='relu')) model.add(Dense(10, activation='softmax')) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=3) print('saving model to disk') model.save('digit_recognizer.h5')
Reloading the Model with Testing Data
Next we reload the mode from disk and load up the testing data from disk in order to evaluate the model and have data for interactive mode, which is coming up soon. If reEvaluate
is set, then we run the model.evaluate()
function to see what kind of accuracy we got from the testing dataset based on the earlier training. We’ll see that the model as configured is very strong.
print('loading model from disk') model = tf.keras.models.load_model('digit_recognizer.h5') print('loading testing arrays') x_test = np.load('images_test_array.npy') y_test = np.load('labels_test_array.npy') if reEvaluate: print('running model.evaluate against test data') val_loss, val_acc = model.evaluate(x_test, y_test) print('evaluation: ', val_loss, val_acc)
Doing Interactive Predictions
Finally, we want to get the sample count for the testing database again (just to make sure we have it right with all these boolean variables affecting program flow). Then we enter an endless loop of asking for user input, taking the integer index input, and getting the “guess” for that character from the dataset, as well as the “right answer” from the labels dataset. As a last step, we use the matplotlib imshow
functions to create a default plot of the 28×28 character data, and plot it in grayscale mode.
Dismissing the plot window when done lets the while
loop repeat until “quit” is entered.
pos_image, sample_count, rows_per_image, columns_per_image = read_image_file_header('emnist-digits-test-images-idx3-ubyte') print(pos_image, sample_count, rows_per_image, columns_per_image) if doPredictions: print('sample size = ' + str(sample_count)) predictions = model.predict([x_test]) while True: command = input('Enter a sample number to view, "quit" to stop : ') if command.upper() == 'QUIT': break index = int(command) guess = argmax(predictions[index]) confidence = float(predictions[index][guess]) print('program guesses ' + str(guess) + ' with confidence of ' + '{:.2%}'.format(confidence)) print('correct answer : ' + str(y_test[index])) plt.imshow(x_test[index], cmap='gray') plt.show() exit(0)
This is the entirety of the code for this module, and I hope you enjoyed it and maybe learned a trick or two. I think I’ll move on to alphabetic characters next…
[…] learning applications often need to process a lot of data. In an earlier post, I showed how to read the large EMNIST dataset for deep learning of handwriting recognition. In that article, I used a very naive approach to […]
LikeLike