The EMNIST Dataset — Handwriting Recognition in Deep Machine Learning, Part II

In my previous post, I showed the support functions necessary to read and save the EMNIST handwriting recognition datasets for use in deep machine learning programs.  In this installment I’ll cover the entire working program, line by line, with the result that you’ll be able to visualize individual characters from the testing database, like this:

emnist3

We can all see that’s a numeral three.  Let’s see how we got here.

The Imports

There are numerous imports needed to make this work.  We need some tools from keras and Tensorflow of course, and we need some numpy modules to save and restore the arrays from disk.  We also need the matplotlib package to visualize the character as shown above.

from keras.models import Sequential
from keras.layers import *
import numpy as np
import tensorflow as tf
from numpy import argmax
import matplotlib.pyplot as plt

Reading the File Headers

In addition to what we did in the earlier installment of this article, we’ll also be adding the function to read the “labels” file header. This label file contains one byte per sample that represents the “right answer” as to what its corresponding digit in the image dataset it. So for example, if the image at index 2002 is a “3” then the image file will have 0x03 at index 2002.  But as in the last installment, the main reason to read the headers is to gather sample metadata and return that and the starting position of the data in a tuple for use in the main program.

def read_image_file_header(filename):
    f = open(filename, 'rb')
    int.from_bytes(f.read(4), byteorder='big')  # magic number, discard it
    count = int.from_bytes(f.read(4), byteorder='big')  # number of samples in data set
    rows = int.from_bytes(f.read(4), byteorder='big')  # rows per image
    columns = int.from_bytes(f.read(4), byteorder='big')  # columns per image
    pos = f.tell()  # current position used as offset later when reading data
    f.close()
    return pos, count, rows, columns


def read_label_file_header(filename):
    f = open(filename, 'rb')
    int.from_bytes(f.read(4), byteorder='big')  # magic number
    int.from_bytes(f.read(4), byteorder='big')  # sample count
    pos = f.tell()
    f.close()
    return pos

Reading the Images and Labels

These matching functions, again for images and their corresponding labels, read the EMNIST dataset as described in the earlier article and save the resulting numpy arrays to disk. This allows them to be imported directly into keras. You’ll see that the method of reading is quite similar, except the image reader reads and constructs a 28×28 array, while the label readers reads a single byte into a 1D list.

def load_save_images(inputfilename, byte_offset, outputfilename, cols, rows, count):
    list_data = []
    infile = open(inputfilename, 'rb')
    infile.seek(byte_offset)
    for n in range(count):
        image_matrix = [[0 for x in range(cols)] for y in range(rows)]
        for r in range(rows):
            for c in range(cols):
                byte = infile.read(1)
                image_matrix[c][r] = float(ord(byte))
        list_data.append(image_matrix)
        # show progress
        if n % 5000 == 0:
            print("... " + str(n))
    infile.close()
    print('converting to numpy array')
    list_data = np.array(list_data, dtype=np.float32)
    print('normalizing')
    list_data = tf.keras.utils.normalize(list_data, axis=1)
    print('saving')
    np.save(outputfilename, list_data)


def load_save_labels(inputfilename, offset, outputfilename, count):
    list_data = []
    infile = open(inputfilename, 'rb')
    infile.seek(offset)
    for n in range(count):
        byte = infile.read(1)
        list_data.append(ord(byte))
        # show progress
        if n % 5000 == 0:
            print("... " + str(n))
    infile.close()
    print('converting to numpy array')
    list_data = np.array(list_data, dtype=np.uint8)
    print('saving')
    np.save(outputfilename, list_data)

One improvement over the version posted in the previous article is

    list_data = np.array(list_data, dtype=np.float32)
and
    list_data = np.array(list_data, dtype=np.uint8)

Specifying the data type to use as a smaller type (using the dtype keyword argument) results in arrays that are quite a bit smaller when saved on disk, and which perform just as well for this class of problem.

The Main Program

That’s it for the support def() methods. Next we begin on the main program. Because some of these operations are lengthy, I first define a few boolean variables to allow us to “turn off” processing of certain sections once they run successfully. This allows us to save time and disk activity for data that, once computed, will remain static unless the program is changed.

reReadAllData = True
reTrain = True
reEvaluate = True
doPredictions = True

Reading and Saving the Datasets

This is where the most time-consuming code is, as we read each of the datasets and their label files byte by byte and the construct, normalize (for the image data), and save the arrays out to disk. This block is guarded by the boolean reReadAllData so that once done successfully, this variable can be set to false, saving time.

if reReadAllData:

    input_filename = 'emnist-digits-train-images-idx3-ubyte'
    offset, sample_count, rows_per_image, columns_per_image = read_image_file_header(input_filename)
    load_save_images(input_filename, offset, 'images_train_array', columns_per_image, rows_per_image, sample_count)

    input_filename = 'emnist-digits-train-labels-idx1-ubyte'
    offset = read_label_file_header(input_filename)
    load_save_labels(input_filename, offset, 'labels_train_array', sample_count)

    input_filename = 'emnist-digits-test-images-idx3-ubyte'
    offset, sample_count, rows_per_image, columns_per_image = read_image_file_header(input_filename)
    load_save_images(input_filename, offset, 'images_test_array', columns_per_image, rows_per_image, sample_count)

    input_filename = 'emnist-digits-test-labels-idx1-ubyte'
    offset = read_label_file_header(input_filename)
    load_save_labels(input_filename, offset, 'labels_test_array', sample_count)

Setting Up the Model

Now we instantiate the keras model and if reTrain is True, we load the training data from disk, configure the model, compile, and fit the data. Finally, in case this works to our satisfaction, we save the keras/Tensorflow model to disk so that we don’t need to do this again.

model = Sequential()

if reTrain:

    print('training model')
    x_train = np.load('images_train_array.npy')
    y_train = np.load('labels_train_array.npy')

    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(10, activation='softmax'))

    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=3)

    print('saving model to disk')
    model.save('digit_recognizer.h5')

Reloading the Model with Testing Data

Next we reload the mode from disk and load up the testing data from disk in order to evaluate the model and have data for interactive mode, which is coming up soon. If reEvaluate is set, then we run the model.evaluate() function to see what kind of accuracy we got from the testing dataset based on the earlier training. We’ll see that the model as configured is very strong.

print('loading model from disk')
model = tf.keras.models.load_model('digit_recognizer.h5')

print('loading testing arrays')
x_test = np.load('images_test_array.npy')
y_test = np.load('labels_test_array.npy')

if reEvaluate:

    print('running model.evaluate against test data')
    val_loss, val_acc = model.evaluate(x_test, y_test)
    print('evaluation: ', val_loss, val_acc)

Doing Interactive Predictions

Finally, we want to get the sample count for the testing database again (just to make sure we have it right with all these boolean variables affecting program flow). Then we enter an endless loop of asking for user input, taking the integer index input, and getting the “guess” for that character from the dataset, as well as the “right answer” from the labels dataset. As a last step, we use the matplotlib imshow functions to create a default plot of the 28×28 character data, and plot it in grayscale mode.

emnist5

Dismissing the plot window when done lets the while loop repeat until “quit” is entered.

pos_image, sample_count, rows_per_image, columns_per_image = read_image_file_header('emnist-digits-test-images-idx3-ubyte')
print(pos_image, sample_count, rows_per_image, columns_per_image)

if doPredictions:
    print('sample size = ' + str(sample_count))
    predictions = model.predict([x_test])
    while True:
        command = input('Enter a sample number to view, "quit" to stop : ')
        if command.upper() == 'QUIT': break
        index = int(command)
        guess = argmax(predictions[index])
        confidence = float(predictions[index][guess])
        print('program guesses ' + str(guess) + ' with confidence of ' + '{:.2%}'.format(confidence))
        print('correct answer : ' + str(y_test[index]))
        plt.imshow(x_test[index], cmap='gray')
        plt.show()

exit(0)

This is the entirety of the code for this module, and I hope you enjoyed it and maybe learned a trick or two. I think I’ll move on to alphabetic characters next…

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s