Hi all, and join me on this exploration of using the Python machine learning front end Keras to create a simple program that will learn to add numbers.
The Problem to Solve
This was my first attempt at using Keras, so I chose what I thought would be a simple problem. I wanted to create random couples of numbers in some range (between, say, zero and one hundred) and use them to train the model to add the numbers and report the result.
First, Some Results
Before diving into this, let’s see the results of adding some numbers. I’ll excerpt a few here. We start with integers in the range of zero to one hundred, but then try some other values outside that range: larger numbers, negative numbers, decimals, and fractions (using the python Fraction()
type.
0 18.000000 58.000000 76.000011 76 1 58.000000 98.000000 155.999982 156 2 22.000000 90.000000 111.999989 112 3 50.000000 93.000000 142.999995 143 4 44.000000 55.000000 99.000001 99 5 64.000000 14.000000 78.000009 78 6 68.000000 15.000000 83.000010 83 7 10.000000 94.000000 103.999996 104 8 58.000000 33.000000 90.999997 91 9 6.000000 84.000000 89.999998 90 10 82.000000 26.000000 108.000004 108 11 42.000000 29.000000 71.000016 71 12 39.000000 98.000000 136.999989 137 13 26.000000 22.000000 48.000023 48 14 18.000000 24.000000 42.000020 42 15 44.000000 47.000000 91.000009 91 16 80.000000 52.000000 131.999993 132 17 26.000000 51.000000 77.000004 77 18 59.000000 71.000000 129.999995 130 19 35.000000 48.000000 83.000010 83 0 1200.000000 1343.000000 2542.999268 2543.00 1 1.000000 1.000000 2.000033 2.00 2 -3.000000 -3.000000 -5.999964 -6.00 3 3.200000 3.250000 6.450031 6.45
So you can see (in the fourth column, which contains the computed value) that the program we’ll write together here adds numbers to a high degree of precision, and even performs well outside the range of the data on which it was trained.
Support Methods First
So the first method I wrote is a function that returns an array of tuples
([x1, x2], y)
where x1 and x2 are the two numbers to add (addend and augend, respectively), and y is the actual sum of the two.
from numpy import array from random import seed, randint def create_triples(count, max_value): addends = list() sums = list() for n in range(count): addends.append([randint(0, max_value), randint(0, max_value)]) sums.append(sum(addends[n])) addends = array(addends) sums = array(sums) return addends, sums
Now, one of the first things I learned from studying machine learning is that everyone “normalizes” their data, usually scaling it to a range between 0 and 1 or -1 and 1. I don’t yet know the formal reasons for this, but from other examples I found, I created a pair of methods to normalize the data and then to invert that same operation.
def denormalize(value, maxvalue): return value * float(maxvalue * 2.0) def normalize(value, maxvalue): return value.astype('float') / float(maxvalue * 2.0)
The next major step is in setting up the neural network model. For this one, I chose a first layer of input_dim=2
since we are adding two numbers, put a couple of small Dense layers in the middle, and a single output Dense layer as the output, as the model will produce exactly one sum as its output.
from keras.models import Sequential, load_model from keras.layers import Dense def setup_model(m): m.add(Dense(4, input_dim=2)) m.add(Dense(2)) m.add(Dense(1)) m.compile(loss='mean_squared_error', optimizer='adam')
Next, we need code to train the model. I packaged my training up in a function so that later, once the model is trained, I can omit calling this, and thus save the time and variability added by retraining every time. Plus, it’s just good structured programming practice.
def train_model(m): for _ in range(50): x, y = create_triples(valueCount, maxValue) x2 = normalize(x, maxValue) y2 = normalize(y, maxValue) m.fit(x2, y2, epochs=3, batch_size=2, verbose=0) def save_model(m, filename): m.save(filename)
I also tossed the code in to save the model. So this code, given the model m
, trains it by running through 50 iterations, each of three training “epochs.” In each iteration we create a set of tuples, normalize them, run the model’s fit()
function to train it and adaptively assign weights to the neural network connections so that the model adapts to produce the smallest mean squared error, as show above in the model setup.
On To the Main Program
The first step in the main program is to set the random number generator up with a constant seed value so that the random numbers generated will be reproducible through all our testing runs with the seed()
function. Next, we set a couple of constant values and declare an instance of a keras model object, as we’ll need the object whether we’re retraining or just using it. Finally, I create a boolean variable that determines whether we use the existing model (that we saved after training) or whether we retrain the model.
seed(100) valueCount: int = 100 maxValue: int = 100 model = Sequential() # decide whether to load/use existing model or retrain and save changed model use_existing_model = True if use_existing_model: model = load_model('trained_model_2.h5') else: setup_model(model) train_model(model) model.save('trained_model_2.h5')
Note: the .h5
extension is a standard extension for keras tensorflow models.
Testing the Model’s Performance
Next, we create some more random data, but this time for testing purposes. Again, we normalize the test data and create a result set with the model.predict()
function. This function will create a list of test results that we can then view.
# evaluate model x, y = create_triples(count=20, max_value=maxValue) x2 = normalize(x, maxValue) testresult = model.predict(x2, batch_size=1, verbose=0)
Showing Results
Here is the final loop in which we present results. Remembering that the predictions will be returned as a list, we can loop through the list and get the results and print them. You’ll also see in the print()
statement that I show five values:
-
- the iteration number>
- the first addend
- the second augend
- the value returned from the model
- and the value of
y[i]
which is the actual sum value computed in thecreate_triples()
method.
# show results for i in range(len(testresult)): addend = denormalize(x2[i][0], maxValue) augend = denormalize(x2[i][1], maxValue) total = denormalize(testresult[i][0], maxValue) print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:4d}'.format(i, addend, augend, total, y[i]))
The Output
0 18.000000 58.000000 76.000011 76 1 58.000000 98.000000 155.999982 156 2 22.000000 90.000000 111.999989 112 3 50.000000 93.000000 142.999995 143 4 44.000000 55.000000 99.000001 99 5 64.000000 14.000000 78.000009 78 6 68.000000 15.000000 83.000010 83 7 10.000000 94.000000 103.999996 104 8 58.000000 33.000000 90.999997 91 9 6.000000 84.000000 89.999998 90 10 82.000000 26.000000 108.000004 108 11 42.000000 29.000000 71.000016 71 12 39.000000 98.000000 136.999989 137 13 26.000000 22.000000 48.000023 48 14 18.000000 24.000000 42.000020 42 15 44.000000 47.000000 91.000009 91 16 80.000000 52.000000 131.999993 132 17 26.000000 51.000000 77.000004 77 18 59.000000 71.000000 129.999995 130 19 35.000000 48.000000 83.000010 83
Making Individual Predictions
To make individual predictions, all we have to do is create another testing dataset in the same shape as the ones already used. Remembering that the dataset is really a list of lists, we just create one with special values we want to test. We then create a numpy array out of this list of lits, normalize it, and run model.predict()
again on the data, and loop through the results just as before.
# do predictions of hand-coded values x = [[1200, 1343], [1, 1], [-3, -3], [Fraction(16, 5), 3.25]] x = array(x) x2 = normalize(x, maxValue) testresult = model.predict(x2, batch_size=1, verbose=0) # show results for i in range(len(testresult)): addend = denormalize(x2[i][0], maxValue) augend = denormalize(x2[i][1], maxValue) total = denormalize(testresult[i][0], maxValue) print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:8.2f}'.format(i, addend, augend, total, total)) exit(0)
Some Caveats
This was my first experiment in neural networking for deep machine learning. There are doubtless many ways this could be improved, and if you have comments on improvements I’d be glad to learn them. But it was a very satisfying exercise and shows the power of even a simple deep learning model.
The Complete Program
from keras.models import Sequential, load_model from keras.layers import Dense from random import seed, randint from numpy import array from fractions import Fraction def create_triples(count, max_value): addends = list() sums = list() for n in range(count): addends.append([randint(0, max_value), randint(0, max_value)]) sums.append(sum(addends[n])) addends = array(addends) sums = array(sums) return addends, sums def denormalize(value, maxvalue): return value * float(maxvalue * 2.0) def normalize(value, maxvalue): return value.astype('float') / float(maxvalue * 2.0) def setup_model(m): m.add(Dense(4, input_dim=2)) m.add(Dense(2)) m.add(Dense(1)) m.compile(loss='mean_squared_error', optimizer='adam') def train_model(m): for _ in range(50): x, y = create_triples(valueCount, maxValue) x2 = normalize(x, maxValue) y2 = normalize(y, maxValue) m.fit(x2, y2, epochs=3, batch_size=2, verbose=0) def save_model(m, filename): m.save(filename) # ---------------------------------------------------------- seed(100) valueCount: int = 100 maxValue: int = 100 model = Sequential() # decide whether to load/use existing model or retrain and save changed model use_existing_model = True if use_existing_model: model = load_model('trained_model_2.h5') else: setup_model(model) train_model(model) model.save('trained_model_2.h5') # evaluate model x, y = create_triples(count=20, max_value=maxValue) x2 = normalize(x, maxValue) testresult = model.predict(x2, batch_size=1, verbose=0) # show results for i in range(len(testresult)): addend = denormalize(x2[i][0], maxValue) augend = denormalize(x2[i][1], maxValue) total = denormalize(testresult[i][0], maxValue) print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:4d}'.format(i, addend, augend, total, y[i])) print("\r\n") # do predictions of hand-coded values x = [[1200, 1343], [1, 1], [-3, -3], [Fraction(16, 5), 3.25]] x = array(x) x2 = normalize(x, maxValue) testresult = model.predict(x2, batch_size=1, verbose=0) # show results for i in range(len(testresult)): addend = denormalize(x2[i][0], maxValue) augend = denormalize(x2[i][1], maxValue) total = denormalize(testresult[i][0], maxValue) print('{:4d} {:12.6f} {:12.6f} {:12.6f} {:8.2f}'.format(i, addend, augend, total, total)) exit(0)