An Introduction to TensorFlow and Keras
In this section, both frameworks will be presented, thus providing you with a general overview of their architecture, the fundamental elements they are composed of, and listing some of their typical applications.
TensorFlow
TensorFlow is an open source numerical computation software library that leverages data flow computational graphs. Its architecture allows users to run it on a wide variety of hardware: from CPUs to Tensor Processing Units (TPUs), including GPUs as well as mobile and embedded platforms. The main difference between the three is the speed and the type of data they are able to perform computations with (multiplications and additions), which, of course, is of primary importance when aiming for maximum performance.
Note
We will be looking at various code implementation examples for TensorFlow in the Keras section of this chapter.
You can refer to the official documentation of TensorFlow for more information here: https://www.tensorflow.org/
The following article is a very good reference if you wish to find out more about the differences between GPUs and TPUs: https://iq.opengenus.org/cpu-vs-gpu-vs-tpu/
TensorFlow is based on a high-performance core implemented in C++ that's provided by a distributed execution engine that works as an abstraction toward the many devices it supports. We will be using TensorFlow 2, which has recently been released. It represents a major milestone for TensorFlow. Its main differences with respect to version 1 are related to its greater ease of use, in particular for model building. In fact, Keras has become the lead tool that's used to easily create models and experiment with them. TensorFlow 2 uses eager execution by default. This allowed the creators of TensorFlow to eliminate the previous complex workflow, which was based on the construction of a computational graph that's then run in a session. With eager execution, this is no longer required. Finally, the data pipeline has been simplified by means of the TensorFlow dataset, which is a common interface that's used to ingest standard or custom datasets with no need to define placeholders.
The execution engine is then interfaced with Python and C++ frontends, which, in turn, are the basis for the Layers API, which provides a simple interface for common layers in deep learning models. This hierarchical structure continues with higher-level APIs, including Keras (which we will describe later in this section). Finally, a set of common models are provided and can be used out of the box.
The following diagram provides an overview of how different TensorFlow modules are hierarchically organized, starting from the low level (bottom) up to the highest level (top):
The historical execution model of TensorFlow was based on computational graphs. Using this approach, the first step when building a model is to create a computation graph that fully describes the calculations we want to perform. The second step is to execute it. This approach has the drawback of being less intuitive with respect to common implementations, where the graph doesn't have to be completed before it can be executed. At the same time, it provides several advantages, making the algorithm highly portable, deployable on different types of hardware platforms, and capable of running in parallel on multiple instances.
In the latest version of TensorFlow (starting with v. 1.7), a new execution model called "eager execution" has been introduced. This is an imperative style for writing code. With eager execution enabled, all algorithmic operations can be run immediately, with no need to build a graph first and then execute it. This new approach has been greeted with enthusiasm and has some very important pros: first, it is much simpler to inspect and debug algorithms and access intermediate values; it is possible to directly use a Python control flow inside TensorFlow APIs; and it makes building and training complex algorithms very easy.
In addition, once the model that has been created using eager execution satisfies requirements, it is possible to automatically convert it into a graph, which makes it possible to leverage all the advantages we looked at previously, such as saving, porting, and distributing models optimally.
Like other machine learning frameworks, TensorFlow provides a large number of ready-to-use models and for many of them, it also provides trained model weights along with the model graph, meaning we can run such models out of the box, and even tune them for a specific use case to take advantage of techniques such as transfer learning with fine tuning. We will cover these in the following sections.
The models provided cover a wide range of different applications, for example:
- Image classification: Able to classify images into categories.
- Object detection: Capable of detecting and localizing multiple objects in images.
- Language understanding and translation: Performing natural language processing for tasks such as word prediction and translation.
- Patch harmonization and style transfer: The algorithm is able to apply a given style (represented, for example, through a painting) to a given photo (refer to the following example).
As we mentioned previously, many of the models include trained weights and examples explaining how to use them. Thus, it is very straightforward to adopt "transfer learning," that is, to take advantage of these pretrained models by creating new ones, retraining only a part of the network on a new dataset. This can be significantly smaller with respect to the one used to train the entire network from scratch.
TensorFlow models can also be deployed on mobile devices. After being trained on large systems, they are optimized to reduce their footprint, which cannot be too big to meet platform limitations. For example, the TensorFlow project known as MobileNet is developing a set of computer vision models specifically designed with optimal speed/accuracy trade-offs in mind. These are typically considered for embedded devices and mobile applications.
The following image represents a typical example of an object detection application where the input image is processed and three objects have been detected, localized, and classified:
The following image shows how style transfer works: the style of the famous painting "The Great Wave off Kanagawa" has been applied to a photo of the Seattle skyline. The results keep the key parts of the picture (the majority of the buildings are there, mountains, and so on), but it is represented through stylistic elements that have been extrapolated from the reference image:
Now, let's learn about Keras.
Keras
Building deep learning models is quite complex, especially when we have to deal with all the typical low-level aspects of major frameworks, and this is one of the most relevant barriers for newcomers in the machine learning field. As an example, the following code shows how to create a simple neural network (one hidden layer with an input size of 100 and an output size of 10) with a low-level TensorFlow API.
In the following code snippet, two functions are being defined. The first builds the weights matrix of a network layer, while the second one creates the bias vector:
def weight_variable(shape):
shape = tf.TensorShape(shape)
initial_values = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial_values)
def bias_variable(shape):
initial_values = tf.zeros(tf.TensorShape(shape))
return tf.Variable(initial_values)
Next, the placeholders for the input (X) and labels (y) are created. They will contain the training samples that will be used to fit the model:
# Define placeholders
X = tf.placeholder(tf.float32, shape=[None, 100])
y = tf.placeholder(tf.int32, shape=[None, 10])
Two matrices and two vectors are created, one couple for each of the two hidden layers of the network to be created, with the functions previously defined. These will contain trainable parameters (network weights):
# Define variables
w1 = weight_variable([X_input.shape[1], 64])
b1 = bias_variable([64])
w2 = weight_variable([64, 10])
b2 = bias_variable([10])
The two network layers are defined via their mathematical definition: matrix multiplication, plus the bias sum and activation function applied to the result:
# Define network
# Hidden layer
z1 = tf.add(tf.matmul(X, w1), b1)
a1 = tf.nn.relu(z1)
# Output layer
z2 = tf.add(tf.matmul(a1, w2), b2)
y_pred = tf.nn.softmax(z2)
y_one_hot = tf.one_hot(y, 10)
The loss function is defined, the optimizer is initialized, and the training metrics are chosen. Finally, the graph is run to perform training:
# Define loss function
loss = tf.losses.softmax_cross_entropy(y, y_pred, \
reduction=tf.losses.Reduction.MEAN)
# Define optimizer
optimizer = tf.train.AdamOptimizer(0.01).minimize(loss)
# Metric
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(y, axis=1), \
tf.argmax(y_pred, axis=1)), tf.float32))
for _ in range(n_epochs):
sess.run(optimizer, feed_dict={X: X_train, y: y_train})
As you can see, we need to manually manage many different aspects: variable declaration, weights initialization, layer creation, layer-related mathematical operations, and the definition of the loss function, optimizers, and metrics. For comparison, the same neural network will be created using Keras later in this section.
Note
The preceding code snippet is an example that demonstrates how to implement a simple fully connected neural network with a TensorFlow low-level API. In Exercise 3.01, Building a Sequential Model with the Keras High-Level API, you will see how much more straightforward it is to do the same job using a Keras high-level API.
Among many different proposals, Keras has become one of the main references for high-level APIs, especially the context of those targeted at creating neural networks. It is written in Python and can be interfaced with different backend computation engines, one of which is, of course, TensorFlow.
Note
You can refer to the official documentation for further reading on Keras here: https://keras.io/.
Keras' conception has been driven by some clear principles, in particular, modularity, user friendliness, easy extendibility, and its straightforward integration with Python. Its aim is to favor adoption by newcomers and non-experienced users, and it presents a very gentle learning curve. It provides many different standalone modules, ranging from neural network layers to optimizers, from initialization schemes to cost functions. These can be easily created to create deep learning models quickly and to code them directly in Python, with no need to use separate configuration files. Given these features, its wide adoption, the fact that it can be interfaced with a large number of different backend engines (for example, TensorFlow, CNTK, Theano, MXNet, and PlaidML) and its wide choice of deployment options, it has risen to become the standard choice in the field.
Since it doesn't have its own low-level implementation, Keras needs to rely on an external element. This can be easily modified by editing (for Linux users) the $HOME/.keras/keras.json file, where it is possible to specify the backend name. It is also possible to specify it by means of the KERAS_BACKEND environment variable.
Keras' fundamental class is Model. There are two different types of model available: The sequential model (which we will use extensively), and the Model class, which is used with the functional API.
The sequential model can be seen as a linear stack of layers, piled one after the other in a very simple way, and these layers can be described very easily. The following exercise shows how short a Python script in Keras that builds a deep neural network using model.add() can be in order to define two dense layers in a sequential model.
Exercise 3.01: Building a Sequential Model with the Keras High-Level API
This exercise shows how to easily build a sequential model, composed of two dense layers, with the Keras high-level API, step by step:
- Import the TensorFlow module and print its version:
import tensorflow as tf
from __future__ import absolute_import, pision, \
print_function, unicode_literals
import tensorflow as tf
print("TensorFlow version: {}".format(tf.__version__))
This outputs the following line:
TensorFlow version: 2.1.0
- Build the model using Keras' sequential and add methods and print a network summary. To continue in parallel with a low-level API, the same activation functions are used. We are using ReLu here, which is a typical activation function that's used for hidden layers. It is a key element that provides nonlinearity to the model thanks to its nonlinear shape. We also use Softmax, which is the activation function typically used for output layers in classification problems. It receives the output values (so-called "logits") from the previous layer and performs a weighting of them, defining all the probabilities of the output classes. The input_dim is the dimension of the input feature vector; it is assumed to have a dimension of 100:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=64, \
activation='relu', input_dim=100))
model.add(tf.keras.layers.Dense(units=10, activation='softmax'))
- Print the standard model architecture:
model.summary()
In our case, the network model summary is as follows:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 64) 6464
_________________________________________________________________
dense_3 (Dense) (None, 10) 650
=================================================================
Total params: 7,114
Trainable params: 7,114
Non-trainable params: 0
_________________________________________________________________
The preceding output is a useful visualization that gives us a clear understanding of layers, their type and shape, and the number of network parameters.
Note
To access the source code for this specific section, please refer to https://packt.live/30A9Dw9.
You can also run this example online at https://packt.live/3cT0cKL.
As anticipated, this exercise showed us how to create a sequential model and how to add two layers to it in a very straightforward way.
We will deal with the remaining aspects later on, but it is still worth noting that training the model we just created and performing inference only requires very few lines of code, as presented in the following snippet, which needs to be appended to the snippet of Exercise 3.01, Building a Sequential Model with the Keras High-Level API:
model.compile(loss='categorical_crossentropy', optimizer='sgd', \
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=32)
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)
classes = model.predict(x_test, batch_size=128)
If more complex models are required, the sequential API is too limited. For these needs, Keras provides the functional API, which allows us to create models that are able to manage complex networks graphs, such as networks with multiple inputs and/or multiple outputs, recurrent neural networks where data processing is not sequential but instead is cyclic, and context, where layers' weights are shared among different parts of the network. For this purpose, Keras allows us to leverage the same set of layers as the sequential model, but provides more flexibility in putting them together. First, we have to define the layers and put them together. An example is presented in the following snippet.
First, after importing TensorFlow, an input layer of dimension 784 is created:
import tensorflow as tf
inputs = tf.keras.layers.Input(shape=(784,))
Inputs are processed by the first hidden layer. They go through the ReLu activation function and are returned as output. This output then becomes the input for the second hidden layer, which is exactly the same as the first one, and returns another output, again stored in the x variable:
x = tf.keras.layers.Dense(64, activation='relu')(inputs)
x = tf.keras.layers.Dense(64, activation='relu')(x)
Finally, the x variable goes as input to the final output layer, which has a softmax activation function, and returns predictions:
predictions = tf.keras.layers.Dense(10, activation='softmax')(x)
Once all the passages have been completed, the model can be created by telling Keras where it starts (input variable) and where it ends (predictions variable):
model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
After the model has been built, it is compiled by specifying the optimizer, the loss, and the metrics. Finally, it is fitted onto the training data:
model.compile(optimizer='rmsprop', \
loss='categorical_crossentropy', \
metrics=['accuracy'])
model.fit(data, labels) # starts training
Keras provides a large number of predefined layers, as well as the possibility to code custom ones. Among those, the following are the already available layers:
- Dense layers, which are typically used for fully connected neural networks. They consist of a matrix of weights and a bias.
- Convolution layers are filters that are defined by specific kernels, which are then convolved with the inputs they are applied to. There are layers available for different input dimensions, from 1D to 3D, including the possibility to embed in them complex operations, such as cropping or transposition.
- Locally connected layers are similar to convolution layers in the sense that they act only on a subgroup of the input features, but, unlike convolution layers, they don't share weights.
- Pooling layers are layers that are used to downscale the input. As convolutional layers, they are available for inputs with dimensionality ranging from 1D to 3D. They include most of the common variants, such as max and average pooling.
- Recurrent layers are used for recurrent neural networks, where the output of a layer is also fed backward in the network. They support state-of-the-art units such as Gated Recurrent Units (GRUs), Long Short-Term Memory (LSTM) units, and others.
- Activation functions are also available in the form of layers. These are functions that are applied to layer outputs, such as ReLu, Elu, Linear, Tanh, and Softmax.
- Lambda layers are layers for embedding arbitrary, user-defined expressions.
- Dropout layers are special objects that randomly set a fraction of the input units to 0 at each training update to avoid overfitting (more on this later).
- Noise layers are additional layers, such as dropout, that are used to avoid overfitting.
Keras also provides common datasets, as well as famous models. For image-related applications, many networks are available, such as Xception, VGG16, VGG19, ResNet50, InceptionV3, InceptionResNetV2, MobileNet, DenseNet, NASNet, and MobileNetV2TK, all of which are pretrained on ImageNet. Keras also provides text and sequences and generative models, making a total of more than 40 algorithms.
As we saw for TensorFlow, Keras models have a vast choice of deployment platforms, including iOS, via CoreML (supported by Apple); Android, via the TensorFlow Android runtime; in a browser, via Keras.js and WebDNN; on Google Cloud, via TensorFlow-Serving; in a Python webapp backend; on the JVM, via DL4J model import; and on a Raspberry Pi.
Now that we've looked at both TensorFlow and Keras, from the next section onward, our main focus will be on how to use them in combination to create deep neural networks. Keras will be used as a high-level API, given its user-friendliness, including TensorFlow, which will be the backend.