Part of the series Learn TensorFlow Now

Over the last nine posts, we built a reasonably effective digit classifier. Now we’re ready to enter the big leagues and try out our VGGNet on a more challenging image recognition task. CIFAR-10 (Canadian Institute For Advanced Research) is a collection of 60,000 cropped images of planes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.

  • 50,000 images in the training set
  • 10,000 images in the test set
  • Size: 32×32 (1024 pixels)
  • 3 Channels (RGB)
  • 10 output classes
Sample images from CIFAR-10

CIFAR-10 is a natural next-step due to its similarities to the MNIST dataset. For starters, we have the same number of training images, testing images and output classes. CIFAR-10’s images are of size 32x32 which is convenient as we were paddding MNIST’s images to achieve the same size. These similarities make it easy to use our previous VGGNet architecture to classify these images.

Despite the similarities, there are some differences that make CIFAR-10 a more challenging image recognition problem. For starters, our images are RGB and therefore have 3 channels. Detecting lines might not be so easy when they can be drawn in any color. Another challenge is that our images are now 2-D depictions of 3-D objects. In the above image, the center two images represent the “truck” class, but are shown at different angles. This means our network has to learn enough about “trucks” to recognize them at angles it has never seen before.

Loading CIFAR-10

The CIFAR-10 dataset is hosted at:

In order to make it easier to work with, I’ve prepared a small script that downloads, shuffles and caches the dataset locally. You can find it on GitHub here.

After saving this file locally, we can use it to prepare our datasets:

Running this locally produces the following output:

Attempting to download:
Download Complete!
(50000, 32, 32, 3)
(10000, 32, 32, 3)
(32, 32, 3)

The above output shows that we’ve downloaded the dataset and created a training set of size 50,000 and a test set of size 10,000. Note: Unlike MNIST, these labels are not 1-hot encoded (otherwise they’d be of size 50,000x10 and 10,000x10 respectively). We have to account for this difference in shape when we build VGGNet for this dataset.

Let’s start by adjusting input and labels to fit the CIFAR-10 dataset:

Next we have to adjust the first layer of our network. Recall from the post on convolutions that each convolutional filter must match the depth of the layer against which it is convolved. Previously we had defined our convolutional filter to be of shape [3, 3, 1, 64]. That is, a 64 3x3 convolutional filters, each with depth of 1, matching the depth of our grayscale input image. Now that we’re using RGB images, we must define it to be of shape [3, 3, 3, 64]:

Another change we must make is the calculation of cost. Previously we were using tf.nn.softmax_cross_entropy_with_logits() which is suitable only when our labels are 1-hot encoded. When we represent the labels as single integers, we can instead use tf.nn.sparse_softmax_cross_entropy_with_logits(). It is otherwise identical to our original softmax cross entropy function.

Finally, we must also modify our calculation of correction_prediction (used to calculate accuracy) to account for the change in label shape. We no longer have to take the tf.argmax of our labels because they’re already represented as a single number:

Note: We have to specify output_type=tf.int32 because tf.argmax() returns tf.int64 by default.

With that, we’ve got everything we need to test our VGGNet on CIFAR-10. The complete code is presented at the end of this post.

After running our network for 10,000 steps, we’re greeted with the following output:

Cost: 470.996
Accuracy: 9.00000035763 %
Cost: 2.00049
Accuracy: 25.0 %
Cost: 0.553867
Accuracy: 82.9999983311 %
Cost: 0.393799
Accuracy: 87.0000004768 %
Test Cost: 0.895597087741
Test accuracy: 70.9400003552 %

Our final test accuracy appears to be approximately 71%, which isn’t too great. On one hand this is disappointing as it means our VGGNet architecture (or the method in which we’re training it) doesn’t generalize very well. On the other hand, CIFAR-10 presents us with new opportunities to try out new neural network components and architectures. In the next few posts we’ll explore some of these approaches to build a neural network that can handle the more complex CIFAR-10 dataset.

If you look carefully at the previous results you may have noticed something interesting. For the first time, our test accuracy (71%) is much lower than our training accuracy (~82-87%). This is a problem we’ll discuss in future posts on bias and variance in deep learning.

Complete Code

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s