Before we improve our network, we have to take a moment to chat about TensorFlow graphs. As we saw in the previous post, we follow two steps when using TensorFlow:
- Create a computational graph
- Run data through the graph using
Let’s take a look at what’s actually happening when we call
tf.Session.run(). Consider our graph and session code from last time:
When we pass
session.run(), TensorFlow looks at the dependencies for these two nodes. For example, we can see above that
optimizer depends on:
We can also see that
cost depends on:
When we wish to evaluate
cost, TensorFlow first runs all the operations defined by the previous nodes, then calculates the required results and returns them. Since every node ends up being a dependency of
cost, this means that every operation in our TensorFlow graph is executed with every call to
But what if we don’t want to run every operation? If we want to pass test data to our network, we don’t want to run the operations defined by
optimizer. (After all, we don’t want to train our network on our test set!) Instead, we’d just want to extract predictions from
logits. In that case, we could instead run our network as follows:
This would execute only the subset of nodes required to compute the values of
logits, highlighted below:
labels is not one of the dependencies of
logits we don’t need to provide it.
Understanding the dependencies of the computational graphs we create is important. We should always try to be aware of exactly what operations will be running when we call
session.run() to avoid accidentally running the wrong operations.
Another important topic to understand is how TensorFlow shapes work. In our previous post all our shapes were completely defined. Consider the following
We have defined these tensors to have a 2-D shape of precisely
(100, 784) and
(100, 10). This restricts us to a computational graph that always expects 100 images at a time. What if we have a training set that isn’t divisible by 100? What if we want to test on single images?
The answer is to use dynamic shapes. In places where we’re not sure what shape we would like to support, we just substitute in
None. For example, if we want to allow variable batch sizes, we simply write:
Now we can pass in batch sizes of 1, 10, 283 or any other size we’d like. From this point on, we’ll be defining all of our
tf.Placeholders in this fashion.
One important question remains: “How well is our network doing?“. In the previous post, we saw
cost decreasing, but we had no concrete metric against which we could compare our network. We’ll keep things simple and use accuracy as our metric. We just want to measure the average number of correction predictions:
In the first line, we convert
logits to a set of predictions using
tf.nn.softmax. Remember that our
labels are 1-hot encoded, meaning each one contains 10 numbers, one of which is 1.
logits is the same shape, but the values in
logits can be almost anything. (eg. values in
logits could be -4, 234, 0.5 and so on). We want our
predictions to have a few qualities that
logits does not possess:
- The sum of the values in
predictionsfor a given image should be 1
- No values in
predictionsshould be greater than 1
- No values in
predictionsshould be negative
- The highest value in
predictionswill be our prediction for a given image. (We can use
argmaxto find this)
logits gives us these desired properties. For more details on softmax, watch this video by Andrew Ng.
The second line takes the
argmax of our predictions and of our labels. Then
tf.equal creates a vector that contains either
True (when the values match) and
False when the values don’t match.
Finally, we use
tf.reduce_mean to calculate the average number of times we get the prediction correct for this batch. We store this result in
Putting it all together
Now that we better understand TensorFlow graphs, shape and have a metric with which to judge our algorithm, let’s put it all together to evaluate our performance on the test set, after training has finished.
Note that almost all of the new code relates to running the test set.
One question you might ask is: Why not just predict all the test images at once, in one big batch of 10,000? The problem is that when we train larger networks on our GPU, we won’t be able to fit all 10,000 images and the required operations in our GPU’s memory. Instead we have to process the test set in batches similar to how we train the network.
Finally, let’s run it and look at the output. When I run it on my local machine I receive the following:
Cost: 20.207457 Accuracy: 7.999999821186066 % Cost: 10.040323 Accuracy: 14.000000059604645 % Cost: 8.528659 Accuracy: 14.000000059604645 % Cost: 6.8867884 Accuracy: 23.999999463558197 % Cost: 7.1556334 Accuracy: 21.99999988079071 % Cost: 6.312024 Accuracy: 28.00000011920929 % Cost: 4.679361 Accuracy: 34.00000035762787 % Cost: 5.220028 Accuracy: 34.00000035762787 % Cost: 5.167577 Accuracy: 23.999999463558197 % Cost: 3.5488296 Accuracy: 40.99999964237213 % Cost: 3.2974648 Accuracy: 43.00000071525574 % Cost: 3.532155 Accuracy: 46.99999988079071 % Cost: 2.9645846 Accuracy: 56.00000023841858 % Cost: 3.0816755 Accuracy: 46.99999988079071 % Cost: 3.0201495 Accuracy: 50.999999046325684 % Cost: 2.7738256 Accuracy: 60.00000238418579 % Cost: 2.4169116 Accuracy: 55.000001192092896 % Cost: 1.944017 Accuracy: 60.00000238418579 % Cost: 3.5998762 Accuracy: 50.0 % Cost: 2.8526196 Accuracy: 55.000001192092896 % Test Cost: 2.392377197146416 Test accuracy: 59.48999986052513 % Press any key to continue . . .
So we’re getting a test accuracy of ~60%. This is better than chance, but it’s not as good as we’d like it to be. In the next post, we’ll look at different ways of improving the network.