Adding Two Numbers Using Keras — Machine Learning

Hi all, and join me on this exploration of using the Python machine learning front end Keras to create a simple program that will learn to add numbers.

The Problem to Solve

This was my first attempt at using Keras, so I chose what I thought would be a simple problem.  I wanted to create random couples of numbers in some range (between, say, zero and one hundred) and use them to train the model to add the numbers and report the result.

First, Some Results

Before diving into this, let’s see the results of adding some numbers.  I’ll excerpt a few here.  We start with integers in the range of zero to one hundred, but then try some other values outside that range: larger numbers, negative numbers, decimals, and fractions (using the python Fraction() type.

   0    18.000000    58.000000    76.000011   76
   1    58.000000    98.000000   155.999982  156
   2    22.000000    90.000000   111.999989  112
   3    50.000000    93.000000   142.999995  143
   4    44.000000    55.000000    99.000001   99
   5    64.000000    14.000000    78.000009   78
   6    68.000000    15.000000    83.000010   83
   7    10.000000    94.000000   103.999996  104
   8    58.000000    33.000000    90.999997   91
   9     6.000000    84.000000    89.999998   90
  10    82.000000    26.000000   108.000004  108
  11    42.000000    29.000000    71.000016   71
  12    39.000000    98.000000   136.999989  137
  13    26.000000    22.000000    48.000023   48
  14    18.000000    24.000000    42.000020   42
  15    44.000000    47.000000    91.000009   91
  16    80.000000    52.000000   131.999993  132
  17    26.000000    51.000000    77.000004   77
  18    59.000000    71.000000   129.999995  130
  19    35.000000    48.000000    83.000010   83

   0  1200.000000  1343.000000  2542.999268  2543.00
   1     1.000000     1.000000     2.000033     2.00
   2    -3.000000    -3.000000    -5.999964    -6.00
   3     3.200000     3.250000     6.450031     6.45

So you can see (in the fourth column, which contains the computed value) that the program we’ll write together here adds numbers to a high degree of precision, and even performs well outside the range of the data on which it was trained.

Support Methods First

So the first method I wrote is a function that returns an array of tuples

([x1, x2], y)

where x1 and x2 are the two numbers to add (addend and augend, respectively), and y is the actual sum of the two.

from numpy import array
from random import seed, randint

def create_triples(count, max_value):
    addends = list()
    sums = list()
    for n in range(count):
        addends.append([randint(0, max_value), randint(0, max_value)])
        sums.append(sum(addends[n]))
    addends = array(addends)
    sums = array(sums)
    return addends, sums

Now, one of the first things I learned from studying machine learning is that everyone “normalizes” their data, usually scaling it to a range between 0 and 1 or -1 and 1. I don’t yet know the formal reasons for this, but from other examples I found, I created a pair of methods to normalize the data and then to invert that same operation.

def denormalize(value, maxvalue):
    return value * float(maxvalue * 2.0)
def normalize(value, maxvalue):
    return value.astype('float') / float(maxvalue * 2.0)

The next major step is in setting up the neural network model. For this one, I chose a first layer of input_dim=2 since we are adding two numbers, put a couple of small Dense layers in the middle, and a single output Dense layer as the output, as the model will produce exactly one sum as its output.

from keras.models import Sequential, load_model
from keras.layers import Dense
def setup_model(m):
    m.add(Dense(4, input_dim=2))
    m.add(Dense(2))
    m.add(Dense(1))
    m.compile(loss='mean_squared_error', optimizer='adam')

Next, we need code to train the model.  I packaged my training up in a function so that later, once the model is trained, I can omit calling this, and thus save the time and variability added by retraining every time. Plus, it’s just good structured programming practice.

def train_model(m):
    for _ in range(50):
        x, y = create_triples(valueCount, maxValue)
        x2 = normalize(x, maxValue)
        y2 = normalize(y, maxValue)
        m.fit(x2, y2, epochs=3, batch_size=2, verbose=0)

def save_model(m, filename):
    m.save(filename)

I also tossed the code in to save the model.  So this code, given the model m, trains it by running through 50 iterations, each of three training “epochs.”  In each iteration we create a set of tuples, normalize them, run the model’s fit() function to train it and adaptively assign weights to the neural network connections so that the model adapts to produce the smallest mean squared error, as show above in the model setup.

On To the Main Program

The first step in the main program is to set the random number generator up with a constant seed value so that the random numbers generated will be reproducible through all our testing runs with the seed() function.  Next, we set a couple of constant values and declare an instance of a keras model object, as we’ll need the object whether we’re retraining or just using it.  Finally, I create a boolean variable that determines whether we use the existing model (that we saved after training) or whether we retrain the model.

seed(100)
valueCount: int = 100
maxValue: int = 100
model = Sequential()

# decide whether to load/use existing model or retrain and save changed model
use_existing_model = True

if use_existing_model:
    model = load_model('trained_model_2.h5')
else:
    setup_model(model)
    train_model(model)
    model.save('trained_model_2.h5')

Note: the .h5 extension is a standard extension for keras tensorflow models.

Testing the Model’s Performance

Next, we create some more random data, but this time for testing purposes. Again, we normalize the test data and create a result set with the model.predict() function. This function will create a list of test results that we can then view.

# evaluate model
x, y = create_triples(count=20, max_value=maxValue)
x2 = normalize(x, maxValue)
testresult = model.predict(x2, batch_size=1, verbose=0)

Showing Results

Here is the final loop in which we present results. Remembering that the predictions will be returned as a list, we can loop through the list and get the results and print them. You’ll also see in the print() statement that I show five values:

    1. the iteration number>
    2. the first addend
    3. the second augend
    4. the value returned from the model
    5. and the value of y[i] which is the actual sum value computed in the create_triples() method.
# show results
for i in range(len(testresult)):
    addend = denormalize(x2[i][0], maxValue)
    augend = denormalize(x2[i][1], maxValue)
    total = denormalize(testresult[i][0], maxValue)
    print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:4d}'.format(i, addend, augend, total, y[i]))

The Output

   0    18.000000    58.000000    76.000011   76
   1    58.000000    98.000000   155.999982  156
   2    22.000000    90.000000   111.999989  112
   3    50.000000    93.000000   142.999995  143
   4    44.000000    55.000000    99.000001   99
   5    64.000000    14.000000    78.000009   78
   6    68.000000    15.000000    83.000010   83
   7    10.000000    94.000000   103.999996  104
   8    58.000000    33.000000    90.999997   91
   9     6.000000    84.000000    89.999998   90
  10    82.000000    26.000000   108.000004  108
  11    42.000000    29.000000    71.000016   71
  12    39.000000    98.000000   136.999989  137
  13    26.000000    22.000000    48.000023   48
  14    18.000000    24.000000    42.000020   42
  15    44.000000    47.000000    91.000009   91
  16    80.000000    52.000000   131.999993  132
  17    26.000000    51.000000    77.000004   77
  18    59.000000    71.000000   129.999995  130
  19    35.000000    48.000000    83.000010   83

Making Individual Predictions

To make individual predictions, all we have to do is create another testing dataset in the same shape as the ones already used. Remembering that the dataset is really a list of lists, we just create one with special values we want to test. We then create a numpy array out of this list of lits, normalize it, and run model.predict() again on the data, and loop through the results just as before.

# do predictions of hand-coded values
x = [[1200, 1343], [1, 1], [-3, -3], [Fraction(16, 5), 3.25]]
x = array(x)
x2 = normalize(x, maxValue)
testresult = model.predict(x2, batch_size=1, verbose=0)

# show results
for i in range(len(testresult)):
    addend = denormalize(x2[i][0], maxValue)
    augend = denormalize(x2[i][1], maxValue)
    total = denormalize(testresult[i][0], maxValue)
    print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:8.2f}'.format(i, addend, augend, total, total))

exit(0)

Some Caveats

This was my first experiment in neural networking for deep machine learning. There are doubtless many ways this could be improved, and if you have comments on improvements I’d be glad to learn them. But it was a very satisfying exercise and shows the power of even a simple deep learning model.

The Complete Program

from keras.models import Sequential, load_model
from keras.layers import Dense
from random import seed, randint
from numpy import array
from fractions import Fraction


def create_triples(count, max_value):
    addends = list()
    sums = list()
    for n in range(count):
        addends.append([randint(0, max_value), randint(0, max_value)])
        sums.append(sum(addends[n]))
    addends = array(addends)
    sums = array(sums)
    return addends, sums


def denormalize(value, maxvalue):
    return value * float(maxvalue * 2.0)


def normalize(value, maxvalue):
    return value.astype('float') / float(maxvalue * 2.0)


def setup_model(m):
    m.add(Dense(4, input_dim=2))
    m.add(Dense(2))
    m.add(Dense(1))
    m.compile(loss='mean_squared_error', optimizer='adam')


def train_model(m):
    for _ in range(50):
        x, y = create_triples(valueCount, maxValue)
        x2 = normalize(x, maxValue)
        y2 = normalize(y, maxValue)
        m.fit(x2, y2, epochs=3, batch_size=2, verbose=0)


def save_model(m, filename):
    m.save(filename)


# ----------------------------------------------------------
seed(100)
valueCount: int = 100
maxValue: int = 100
model = Sequential()

# decide whether to load/use existing model or retrain and save changed model
use_existing_model = True

if use_existing_model:
    model = load_model('trained_model_2.h5')
else:
    setup_model(model)
    train_model(model)
    model.save('trained_model_2.h5')

# evaluate model
x, y = create_triples(count=20, max_value=maxValue)
x2 = normalize(x, maxValue)
testresult = model.predict(x2, batch_size=1, verbose=0)

# show results
for i in range(len(testresult)):
    addend = denormalize(x2[i][0], maxValue)
    augend = denormalize(x2[i][1], maxValue)
    total = denormalize(testresult[i][0], maxValue)
    print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:4d}'.format(i, addend, augend, total, y[i]))

print("\r\n")

# do predictions of hand-coded values
x = [[1200, 1343], [1, 1], [-3, -3], [Fraction(16, 5), 3.25]]
x = array(x)
x2 = normalize(x, maxValue)
testresult = model.predict(x2, batch_size=1, verbose=0)

# show results
for i in range(len(testresult)):
    addend = denormalize(x2[i][0], maxValue)
    augend = denormalize(x2[i][1], maxValue)
    total = denormalize(testresult[i][0], maxValue)
    print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:8.2f}'.format(i, addend, augend, total, total))

exit(0)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s