书名：Hands-On Neural Networks with Keras
作者名：Niloy Purkait
本章字数：526字
更新时间：2025-04-04 14:37:32

Images as numbers

For tasks such as these, we need deep networks with multiple hidden layers if we are hoping to learn any representative features for our output classes. We also need a nice dataset to practice our understanding and familiarize ourselves with the tools we will be using to design our intelligent systems. Hence, we come to our first hands-on neural network task as we introduce ourselves to the concepts of computer vision, image processing, and hierarchical representation learning. Our task at hand is to teach computers to read numbers not as 0 and 1s, as they already do, but more in the manner of how we would read digits that are composed by our own kin. We are speaking of handwritten digits, and for this task, we will be using the iconic MNIST dataset, the true hello world of deep learning datasets. For our first example, there are good theoretical and practical reasons behind our choice.

From a theoretical perspective, we need to understand how we can use layer neurons to progressively learn more complex patterns, like our own brain does. Since our brain has had about 2,000 to 2,500 years worth of training data, it has gotten quite good at identifying complex symbols such as handwritten digits. In fact, we normally perceive this as an absolutely effortless task, since we learn how to distinguish between such symbols from as early as preschool. But this is actually quite a daunting task. Consider the vast variations in how each of these digits may be written by different humans, and yet our brains are still able to classify these digits, as if it were much ado about nothing:

While exhaustively coding explicit rules would drive any programmer insane, as we look at the preceding image, our own brain intuitively notice some patterns in the data. For example, it picks up on how both 2 and 3 have a half-loop at the top of them, and how 1, 4, and 7 have a straight downward line. It also perceives how a 4 is actually one downward line, one semi-downward line, and another horizontal line in between the others. Due to this, we are able to easily break down a complex pattern into smaller patterns. This is specifically easy to do with handwritten digits, as we just saw. Therefore, our task will be to see how we can construct a deep neural network and hope for each of our neurons to capture simple patterns from our data, such as line segments, and then progressively construct more complex patterns in deeper layers using the simple patterns we learned in the previous layers. We will do this to learn about the accurate combinations of representations that correspond to our output classes.

Practically speaking, the MNIST dataset has been studied for about two decades by many pioneers in the field of deep learning. We have gained a good wealth of knowledge out of this dataset, making it ideal for exploring concepts such as layer representations, regularization, and overfitting, among others. As soon as we understand how to train and test a neural network, we can repurpose it for more exciting tasks.