书名：Hands-On Neural Networks with Keras
作者名：Niloy Purkait
本章字数：359字
更新时间：2025-04-04 14:37:33

A summary of MNIST

So far in our journey, you were introduced to the fundamental learning mechanisms and processes that govern a neural network's functionality. You learned that neural networks need tensor representations of input data to be able to process it for predictive use cases. You also learned how different types of data that are found in our world, such as images, videos, text, and so on, can be represented as tensors of n-dimensions. Furthermore, you saw how to implement a sequential model in Keras, which essentially lets you build sequential layers of interconnected neurons. You used this model structure to construct a simple feedforward neural network for the task of classifying handwritten digits with the MNIST dataset. In doing so, you learned about the key architectural decisions to consider at each stage of model development.

During model construction, the main decisions pertain to defining the correct input size of your data, choosing a relevant activation function per layer, and defining the number of output neurons in your last layer, according to the number of output classes in your data. During the compilation process, you got to choose the optimization technique, loss function, and a metric to monitor your training progress. Then, you initiated the training session of your newly minted model by using the .fit() parameter, and passing the model the final two architectural decisions to be made before initiating the training procedure. These decisions pertained to the batch size of your data to be seen at a time, and the total number of epochs to train the model for.

Finally, you saw how to test your predictions, and learned about the pivotal concept of regularization. We concluded this classification task by experimenting with regularization techniques to modify our model's size, layer weights, and add dropout layers, which in turn helped us improve the generalizability of our model to unseen data. Lastly, we saw that increasing model complexity is unfavourable unless explicitly required due to the nature of our task:

Exercise x: Initialize different weighted parameters and see how this affects model performance
Exercise y: Initialize different weights per layer and see how this affects model performance