LTFN 2: Graphs and Shapes

Part of the series Learn TensorFlow Now

TensorFlow Graphs

Before we improve our network, we have to take a moment to chat about TensorFlow graphs. As we saw in the previous post, we follow two steps when using TensorFlow:

  1. Create a computational graph
  2. Run data through the graph using tf.Session.run()

Let’s take a look at what’s actually happening when we call tf.Session.run(). Consider our graph and session code from last time:


o, c, = session.run([optimizer, cost], feed_dict=feed_dict)

view raw

ltfn_2_1.py

hosted with ❤ by GitHub

When we pass optimizer and cost to session.run(), TensorFlow looks at the dependencies for these two nodes. For example, we can see above that optimizer depends on:

  • cost
  • layer1_weights
  • layer1_bias
  • input

We can also see that cost depends on:

  • logits
  • labels

When we wish to evaluate optimizer and cost, TensorFlow first runs all the operations defined by the previous nodes, then calculates the required results and returns them. Since every node ends up being a dependency of optimizer and cost, this means that every operation in our TensorFlow graph is executed with every call to session.run().

But what if we don’t want to run every operation? If we want to pass test data to our network, we don’t want to run the operations defined by optimizer. (After all, we don’t want to train our network on our test set!) Instead, we’d just want to extract predictions from logits. In that case, we could instead run our network as follows:


batch_images = test_images[offset😦offset + batch_size), :] # Note: test images
feed_dict = {input: batch_images} # Note: No labels
l = session.run([logits], feed_dict=feed_dict) # Only asking for logits

view raw

ltfn_2_2.py

hosted with ❤ by GitHub

This would execute only the subset of nodes required to compute the values of logits, highlighted below:

Our computational graph with only dependencies of logits highlighted in orange.

Note: As labels is not one of the dependencies of logits we don’t need to provide it.

Understanding the dependencies of the computational graphs we create is important. We should always try to be aware of exactly what operations will be running when we call session.run() to avoid accidentally running the wrong operations.

 

Shapes

Another important topic to understand is how TensorFlow shapes work. In our previous post all our shapes were completely defined. Consider the following tf.Placeholders for input and labels:


input = tf.placeholder(tf.float32, shape=(100, 784))
labels = tf.placeholder(tf.float32, shape=(100, 10))

view raw

ltfn_2_3.py

hosted with ❤ by GitHub

We have defined these tensors to have a 2-D shape of precisely (100, 784) and (100, 10). This restricts us to a computational graph that always expects 100 images at a time. What if we have a training set that isn’t divisible by 100? What if we want to test on single images?

The answer is to use dynamic shapes. In places where we’re not sure what shape we would like to support, we just substitute in None. For example, if we want to allow variable batch sizes, we simply write:


input = tf.placeholder(tf.float32, shape=(None, 784))
labels = tf.placeholder(tf.float32, shape=(None, 10))

view raw

ltfn_2_4.py

hosted with ❤ by GitHub

Now we can pass in batch sizes of 1, 10, 283 or any other size we’d like. From this point on, we’ll be defining all of our tf.Placeholders in this fashion.

 

Accuracy

One important question remains: “How well is our network doing?“. In the previous post, we saw cost decreasing, but we had no concrete metric against which we could compare our network. We’ll keep things simple and use accuracy as our metric. We just want to measure the average number of correction predictions:


predictions = tf.nn.softmax(logits)
correct_prediction = tf.equal(tf.argmax(labels, 1), tf.argmax(predictions, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

view raw

ltfn_2_5.py

hosted with ❤ by GitHub

In the first line, we convert logits to a set of predictions using tf.nn.softmax. Remember that our labels are 1-hot encoded, meaning each one contains 10 numbers, one of which is 1. logits is the same shape, but the values in logits can be almost anything. (eg. values in logits could be -4, 234, 0.5 and so on). We want our predictions to have a few qualities that logits does not possess:

  1. The sum of the values in predictions for a given image should be 1
  2. No values in predictions should be greater than 1
  3. No values in predictions should be negative
  4. The highest value in predictions will be our prediction for a given image. (We can use argmax to find this)

Applying tf.nn.softmax() to logits gives us these desired properties. For more details on softmax, watch this video by Andrew Ng.

The second line takes the argmax of our predictions and of our labels. Then tf.equal creates a vector that contains either True (when the values match) and False when the values don’t match.

Finally, we use tf.reduce_mean to calculate the average number of times we get the prediction correct for this batch. We store this result in accuracy.

Putting it all together

Now that we better understand TensorFlow graphs, shape and have a metric with which to judge our algorithm, let’s put it all together to evaluate our performance on the test set, after training has finished.

Note that almost all of the new code relates to running the test set.


import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
train_images = mnist.train.images;
train_labels = mnist.train.labels
test_images = mnist.test.images;
test_labels = mnist.test.labels
graph = tf.Graph()
with graph.as_default():
input = tf.placeholder(tf.float32, shape=(None, 784))
labels = tf.placeholder(tf.float32, shape=(None, 10))
layer1_weights = tf.Variable(tf.random_normal([784, 10]))
layer1_bias = tf.Variable(tf.zeros([10]))
logits = tf.matmul(input, layer1_weights) + layer1_bias
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
#Add a few nodes to calculate accuracy and optionally retrieve predictions
predictions = tf.nn.softmax(logits)
correct_prediction = tf.equal(tf.argmax(labels, 1), tf.argmax(predictions, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
num_steps = 2000
batch_size = 100
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] batch_size)
batch_images = train_images[offset😦offset + batch_size), :]
batch_labels = train_labels[offset😦offset + batch_size), :]
feed_dict = {input: batch_images, labels: batch_labels}
_, c, acc = session.run([optimizer, cost, accuracy], feed_dict=feed_dict)
if step % 100 == 0:
print("Cost: ", c)
print("Accuracy: ", acc * 100.0, "%")
#Test
num_test_batches = int(len(test_images) / 100)
total_accuracy = 0
total_cost = 0
for step in range(num_test_batches):
offset = (step * batch_size) % (train_labels.shape[0] batch_size)
batch_images = test_images[offset😦offset + batch_size), :]
batch_labels = test_labels[offset😦offset + batch_size), :]
feed_dict = {input: batch_images, labels: batch_labels}
#Note that we do not pass in optimizer here.
c, acc = session.run([cost, accuracy], feed_dict=feed_dict)
total_cost = total_cost + c
total_accuracy = total_accuracy + acc
print("Test Cost: ", total_cost / num_test_batches)
print("Test accuracy: ", total_accuracy * 100.0 / num_test_batches, "%")

view raw

ltfn_2_full.py

hosted with ❤ by GitHub

One question you might ask is: Why not just predict all the test images at once, in one big batch of 10,000? The problem is that when we train larger networks on our GPU, we won’t be able to fit all 10,000 images and the required operations in our GPU’s memory. Instead we have to process the test set in batches similar to how we train the network.

Finally, let’s run it and look at the output. When I run it on my local machine I receive the following:

Cost:  20.207457
Accuracy:  7.999999821186066 %
Cost:  10.040323
Accuracy:  14.000000059604645 %
Cost:  8.528659
Accuracy:  14.000000059604645 %
Cost:  6.8867884
Accuracy:  23.999999463558197 %
Cost:  7.1556334
Accuracy:  21.99999988079071 %
Cost:  6.312024
Accuracy:  28.00000011920929 %
Cost:  4.679361
Accuracy:  34.00000035762787 %
Cost:  5.220028
Accuracy:  34.00000035762787 %
Cost:  5.167577
Accuracy:  23.999999463558197 %
Cost:  3.5488296
Accuracy:  40.99999964237213 %
Cost:  3.2974648
Accuracy:  43.00000071525574 %
Cost:  3.532155
Accuracy:  46.99999988079071 %
Cost:  2.9645846
Accuracy:  56.00000023841858 %
Cost:  3.0816755
Accuracy:  46.99999988079071 %
Cost:  3.0201495
Accuracy:  50.999999046325684 %
Cost:  2.7738256
Accuracy:  60.00000238418579 %
Cost:  2.4169116
Accuracy:  55.000001192092896 %
Cost:  1.944017
Accuracy:  60.00000238418579 %
Cost:  3.5998762
Accuracy:  50.0 %
Cost:  2.8526196
Accuracy:  55.000001192092896 %
Test Cost:  2.392377197146416
Test accuracy:  59.48999986052513 %
Press any key to continue . . .

So we’re getting a test accuracy of ~60%. This is better than chance, but it’s not as good as we’d like it to be. In the next post, we’ll look at different ways of improving the network.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s