Part of the series Learn TensorFlow Now
TensorFlow Graphs
Before we improve our network, we have to take a moment to chat about TensorFlow graphs. As we saw in the previous post, we follow two steps when using TensorFlow:
- Create a computational graph
- Run data through the graph using
tf.Session.run()
Let’s take a look at what’s actually happening when we call tf.Session.run()
. Consider our graph and session code from last time:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
o, c, = session.run([optimizer, cost], feed_dict=feed_dict) |
When we pass optimizer
and cost
to session.run()
, TensorFlow looks at the dependencies for these two nodes. For example, we can see above that optimizer
depends on:
cost
layer1_weights
layer1_bias
input
We can also see that cost
depends on:
logits
labels
When we wish to evaluate optimizer
and cost
, TensorFlow first runs all the operations defined by the previous nodes, then calculates the required results and returns them. Since every node ends up being a dependency of optimizer
and cost
, this means that every operation in our TensorFlow graph is executed with every call to session.run()
.
But what if we don’t want to run every operation? If we want to pass test data to our network, we don’t want to run the operations defined by optimizer
. (After all, we don’t want to train our network on our test set!) Instead, we’d just want to extract predictions from logits
. In that case, we could instead run our network as follows:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
batch_images = test_images[offset😦offset + batch_size), :] # Note: test images | |
feed_dict = {input: batch_images} # Note: No labels | |
l = session.run([logits], feed_dict=feed_dict) # Only asking for logits |
This would execute only the subset of nodes required to compute the values of logits
, highlighted below:

logits
highlighted in orange.Note: As labels
is not one of the dependencies of logits
we don’t need to provide it.
Understanding the dependencies of the computational graphs we create is important. We should always try to be aware of exactly what operations will be running when we call session.run()
to avoid accidentally running the wrong operations.
Shapes
Another important topic to understand is how TensorFlow shapes work. In our previous post all our shapes were completely defined. Consider the following tf.Placeholders
for input
and labels
:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
input = tf.placeholder(tf.float32, shape=(100, 784)) | |
labels = tf.placeholder(tf.float32, shape=(100, 10)) |
We have defined these tensors to have a 2-D shape of precisely (100, 784)
and (100, 10)
. This restricts us to a computational graph that always expects 100 images at a time. What if we have a training set that isn’t divisible by 100? What if we want to test on single images?
The answer is to use dynamic shapes. In places where we’re not sure what shape we would like to support, we just substitute in None
. For example, if we want to allow variable batch sizes, we simply write:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
input = tf.placeholder(tf.float32, shape=(None, 784)) | |
labels = tf.placeholder(tf.float32, shape=(None, 10)) |
Now we can pass in batch sizes of 1, 10, 283 or any other size we’d like. From this point on, we’ll be defining all of our tf.Placeholders
in this fashion.
Accuracy
One important question remains: “How well is our network doing?“. In the previous post, we saw cost
decreasing, but we had no concrete metric against which we could compare our network. We’ll keep things simple and use accuracy as our metric. We just want to measure the average number of correction predictions:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
predictions = tf.nn.softmax(logits) | |
correct_prediction = tf.equal(tf.argmax(labels, 1), tf.argmax(predictions, 1)) | |
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) |
In the first line, we convert logits
to a set of predictions using tf.nn.softmax
. Remember that our labels
are 1-hot encoded, meaning each one contains 10 numbers, one of which is 1. logits
is the same shape, but the values in logits
can be almost anything. (eg. values in logits
could be -4, 234, 0.5 and so on). We want our predictions
to have a few qualities that logits
does not possess:
- The sum of the values in
predictions
for a given image should be 1 - No values in
predictions
should be greater than 1 - No values in
predictions
should be negative - The highest value in
predictions
will be our prediction for a given image. (We can useargmax
to find this)
Applying tf.nn.softmax()
to logits
gives us these desired properties. For more details on softmax, watch this video by Andrew Ng.
The second line takes the argmax
of our predictions and of our labels. Then tf.equal
creates a vector that contains either True
(when the values match) and False
when the values don’t match.
Finally, we use tf.reduce_mean
to calculate the average number of times we get the prediction correct for this batch. We store this result in accuracy
.
Putting it all together
Now that we better understand TensorFlow graphs, shape and have a metric with which to judge our algorithm, let’s put it all together to evaluate our performance on the test set, after training has finished.
Note that almost all of the new code relates to running the test set.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import tensorflow as tf | |
from tensorflow.examples.tutorials.mnist import input_data | |
mnist = input_data.read_data_sets('MNIST_data', one_hot=True) | |
train_images = mnist.train.images; | |
train_labels = mnist.train.labels | |
test_images = mnist.test.images; | |
test_labels = mnist.test.labels | |
graph = tf.Graph() | |
with graph.as_default(): | |
input = tf.placeholder(tf.float32, shape=(None, 784)) | |
labels = tf.placeholder(tf.float32, shape=(None, 10)) | |
layer1_weights = tf.Variable(tf.random_normal([784, 10])) | |
layer1_bias = tf.Variable(tf.zeros([10])) | |
logits = tf.matmul(input, layer1_weights) + layer1_bias | |
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)) | |
learning_rate = 0.01 | |
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) | |
#Add a few nodes to calculate accuracy and optionally retrieve predictions | |
predictions = tf.nn.softmax(logits) | |
correct_prediction = tf.equal(tf.argmax(labels, 1), tf.argmax(predictions, 1)) | |
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) | |
with tf.Session(graph=graph) as session: | |
tf.global_variables_initializer().run() | |
num_steps = 2000 | |
batch_size = 100 | |
for step in range(num_steps): | |
offset = (step * batch_size) % (train_labels.shape[0] – batch_size) | |
batch_images = train_images[offset😦offset + batch_size), :] | |
batch_labels = train_labels[offset😦offset + batch_size), :] | |
feed_dict = {input: batch_images, labels: batch_labels} | |
_, c, acc = session.run([optimizer, cost, accuracy], feed_dict=feed_dict) | |
if step % 100 == 0: | |
print("Cost: ", c) | |
print("Accuracy: ", acc * 100.0, "%") | |
#Test | |
num_test_batches = int(len(test_images) / 100) | |
total_accuracy = 0 | |
total_cost = 0 | |
for step in range(num_test_batches): | |
offset = (step * batch_size) % (train_labels.shape[0] – batch_size) | |
batch_images = test_images[offset😦offset + batch_size), :] | |
batch_labels = test_labels[offset😦offset + batch_size), :] | |
feed_dict = {input: batch_images, labels: batch_labels} | |
#Note that we do not pass in optimizer here. | |
c, acc = session.run([cost, accuracy], feed_dict=feed_dict) | |
total_cost = total_cost + c | |
total_accuracy = total_accuracy + acc | |
print("Test Cost: ", total_cost / num_test_batches) | |
print("Test accuracy: ", total_accuracy * 100.0 / num_test_batches, "%") |
One question you might ask is: Why not just predict all the test images at once, in one big batch of 10,000? The problem is that when we train larger networks on our GPU, we won’t be able to fit all 10,000 images and the required operations in our GPU’s memory. Instead we have to process the test set in batches similar to how we train the network.
Finally, let’s run it and look at the output. When I run it on my local machine I receive the following:
Cost: 20.207457 Accuracy: 7.999999821186066 % Cost: 10.040323 Accuracy: 14.000000059604645 % Cost: 8.528659 Accuracy: 14.000000059604645 % Cost: 6.8867884 Accuracy: 23.999999463558197 % Cost: 7.1556334 Accuracy: 21.99999988079071 % Cost: 6.312024 Accuracy: 28.00000011920929 % Cost: 4.679361 Accuracy: 34.00000035762787 % Cost: 5.220028 Accuracy: 34.00000035762787 % Cost: 5.167577 Accuracy: 23.999999463558197 % Cost: 3.5488296 Accuracy: 40.99999964237213 % Cost: 3.2974648 Accuracy: 43.00000071525574 % Cost: 3.532155 Accuracy: 46.99999988079071 % Cost: 2.9645846 Accuracy: 56.00000023841858 % Cost: 3.0816755 Accuracy: 46.99999988079071 % Cost: 3.0201495 Accuracy: 50.999999046325684 % Cost: 2.7738256 Accuracy: 60.00000238418579 % Cost: 2.4169116 Accuracy: 55.000001192092896 % Cost: 1.944017 Accuracy: 60.00000238418579 % Cost: 3.5998762 Accuracy: 50.0 % Cost: 2.8526196 Accuracy: 55.000001192092896 % Test Cost: 2.392377197146416 Test accuracy: 59.48999986052513 % Press any key to continue . . .
So we’re getting a test accuracy of ~60%. This is better than chance, but it’s not as good as we’d like it to be. In the next post, we’ll look at different ways of improving the network.