Simple Classification Using TensorFlow

This section will help you understand and solve a typical supervised learning problem that falls under the category conventionally named classification.

Classification tasks, in their simplest generic form, aim to associate one category, among a predefined set, with instances. An intuitive example of a classification task that's often used for introductory courses is classifying the images of domestic pets in the correct category they belong to, such as "cat" or "dog." Classification plays a fundamental role in many everyday activities and can easily be encountered in different contexts. The previous example is a specific case of classification called image classification, and many similar applications can be found in this category.

However, classification extends beyond images. The following are some examples:

  • Customer classification for video recommendation systems (answering the question, "In which market segment this user falls?")
  • Spam filters ("What are the chances this email is spam?")
  • Malware detection ("Is this program a cyber threat?")
  • Medical diagnosis ("Is this patient sick?")

For image classification tasks, images are fed to the classification algorithm as inputs, and it returns the class they belong to as output. Images are three-dimensional arrays of numbers representing per-pixel brightness (height x width x number of channels, where color images have three channels – red, green, blue (RGB) – and grayscale images only have one), and these numbers are the features that the algorithm uses to determine the class images belong to.

When dealing with other types of inputs, features can be different. For example, in the case of a medical diagnosis classification system, blood test parameters, age, sex, and suchlike can be features that are used by the algorithm to identify the class the instance belongs to, that is, "sick" or "not sick."

In the following exercise, we will create a deep neural network by building upon what we described in the previous sections. This will be able to achieve an accuracy of around 70% when classifying signals that have been detected inside a simulated ATLAS experiment, distinguishing between background noise and Higgs Boson Tau-Tau decay using a set of 28 features: yes, machine learning applied to particle physics!

Note

For additional information on the dataset, visit the official website: http://archive.ics.uci.edu/ml/datasets/HIGGS.

Given the huge size of the dataset, to keep the exercise easy to run and still meaningful, it will be subsampled: 10,000 rows will be used for training and 1,000 rows each for validation and test. Three different models will be trained: a small model that will be a reference (two layers with 16 and 1 neurons each), a large model with no overfit countermeasures (five layers; four with 512 neurons and the last one with 1 neuron) to demonstrate problems that may be encountered in this scenario, and then regularization and dropout will be added to the large model, effectively limiting overfitting and improving performance.

Exercise 3.06: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for Higgs Boson

In this exercise, we will build, train, and measure the performance of a deep neural network in order to improve the discovery significance of the ATLAS experiment by using simulated data with features for characterizing events. The task is to classify events into two categories: "tau decay of a Higgs Boson" versus "background."

This dataset can be found in the TensorFlow dataset (https://www.tensorflow.org/datasets), which is a collection of ready-to-use datasets. It is available to download and interface via the processing pipeline. In our case, the original dataset is too big for our purposes, so we will postpone dataset usage until we get to this chapter's activity. For now, we will use a subgroup of the dataset that's directly available through the repository.

Note

You can find the dataset in this book's GitHub repository here: https://packt.live/3dUfYq8.

The step-by-step procedure is described in detail as follows:

  1. Import all the required modules and print the versions of the most important ones:

    from __future__ import absolute_import, pision, \

    print_function, unicode_literals

    from IPython import display

    from matplotlib import pyplot as plt

    from scipy.ndimage.filters import gaussian_filter1d

    import pandas as pd

    import numpy as np

    import tensorflow as tf

    print("TensorFlow version: {}".format(tf.__version__))

    The output will be as follows:

    TensorFlow version: 2.1.0

  2. Import the dataset and prepare the data for preprocessing.

    For this exercise, we will download a custom-made smaller subset that's been pulled from the original dataset:

    higgs_path = tf.keras.utils.get_file('HIGGSSmall.csv.gz', \

                 'https://github.com/PacktWorkshops/'\

                 'The-Reinforcement-Learning-Workshop/blob/'\

                 'master/Chapter03/Dataset/HIGGSSmall.csv.gz?raw=true')

  3. Read the CSV dataset into a TensorFlow dataset class and repack it so that it has tuples (features, labels):

    N_TEST = int(1e3)

    N_VALIDATION = int(1e3)

    N_TRAIN = int(1e4)

    BUFFER_SIZE = int(N_TRAIN)

    BATCH_SIZE = 500

    STEPS_PER_EPOCH = N_TRAIN//BATCH_SIZE

    N_FEATURES = 28

    ds = tf.data.experimental\

         .CsvDataset(higgs_path,[float(),]*(N_FEATURES+1), \

                     compression_type="GZIP")

    def pack_row(*row):

        label = row[0]

        features = tf.stack(row[1:],1)

        return features, label

    packed_ds = ds.batch(N_TRAIN).map(pack_row).unbatch()

    Take a look at the value distribution of the features:

    for features,label in packed_ds.batch(1000).take(1):

        print(features[0])

        plt.hist(features.numpy().flatten(), bins = 101)

    The output will be as follows:

    tf.Tensor(

    [ 0.8692932 -0.6350818 0.22569026 0.32747006 -0.6899932

    0.7542022 -0.2485731 -1.0920639 0. 1.3749921

    -0.6536742 0.9303491 1.1074361 1.1389043 -1.5781983

    -1.0469854 0. 0.65792954 -0.01045457 -0.04576717

    3.1019614 1.35376 0.9795631 0.97807616 0.92000484

    0.72165745 0.98875093 0.87667835], shape=(28,), dtype=float32)

    The plot will be as follows:

    Figure 3.21: First feature value distribution

    In the preceding graph, the x axis represents the number of training samples for a given value, while the y axis denotes the first feature's numerical value.

  4. Create training, validation, and test sets:

    validate_ds = packed_ds.take(N_VALIDATION).cache()

    test_ds = packed_ds.skip(N_VALIDATION).take(N_TEST).cache()

    train_ds = packed_ds.skip(N_VALIDATION+N_TEST)\

               .take(N_TRAIN).cache()

  5. Define feature, label, and class names:

    feature_names = ["lepton pT", "lepton eta", "lepton phi",\

                     "missing energy magnitude", \

                     "missing energy phi",\

                     "jet 1 pt", "jet 1 eta", "jet 1 phi",\

                     "jet 1 b-tag",\

                     "jet 2 pt", "jet 2 eta", "jet 2 phi",\

                     "jet 2 b-tag",\

                     "jet 3 pt", "jet 3 eta", "jet 3 phi",\

                     "jet 3 b-tag",\

                     "jet 4 pt", "jet 4 eta", "jet 4 phi",\

                     "jet 4 b-tag",\

                     "m_jj", "m_jjj", "m_lv", "m_jlv", "m_bb",\

                     "m_wbb", "m_wwbb"]

    label_name = ['Measure']

    class_names = ['Signal', 'Background']

    print("Features: {}".format(feature_names))

    print("Label: {}".format(label_name))

    print("Class names: {}".format(class_names))

    The output will be as follows:

    Features: ['lepton pT', 'lepton eta', 'lepton phi',

    'missing energy magnitude', 'missing energy phi',

    'jet 1 pt', 'jet 1 eta', 'jet 1 phi', 'jet 1 b-tag',

    'jet 2 pt', 'jet 2 eta', 'jet 2 phi', 'jet 2 b-tag',

    'jet 3 pt', 'jet 3 eta', 'jet 3 phi', 'jet 3 b-tag',

    'jet 4 pt', 'jet 4 eta', 'jet 4 phi', 'jet 4 b-tag',

    'm_jj', 'm_jjj', 'm_lv', 'm_jlv', 'm_bb', 'm_wbb', 'm_wwbb']

    Label: ['Measure']

    Class names: ['Signal', 'Background']

  6. Show a sample of a training instance for features and labels:

    features, labels = next(iter(train_ds))

    print("Features =")

    print(features.numpy())

    print("Labels =")

    print(labels.numpy())

    The output will be as follows:

    Features =

    [ 0.3923715 1.3781117 1.5673449 0.17123567 1.6574531

    0.86394763 0.88821083 1.4797885 2.1730762 1.2008675

    0.9490923 -0.30092147 2.2148721 1.277294 0.4025028

    0.50748837 0. 0.50555664

    -0.55428815 -0.7055601 0. 0.94152564 0.9448251

    0.9839765 0.7801499 1.4989641 0.91668195 0.8027126 ]

    Labels = 0.0

  7. Assign a batch size to the datasets:

    test_ds = test_ds.batch(BATCH_SIZE)

    validate_ds = validate_ds.batch(BATCH_SIZE)

    train_ds = train_ds.shuffle(BUFFER_SIZE).repeat()\

               .batch(BATCH_SIZE)

  8. Now, let's start creating the model and training it. Create a decaying learning rate:

    lr_schedule = tf.keras.optimizers.schedules\

                  .InverseTimeDecay(0.001,\

                                    decay_steps=STEPS_PER_EPOCH*1000, \

                                    decay_rate=1, staircase=False)

  9. Define a function that will compile a model with an Adam optimizer, use binary cross entropy as the loss function, and fit it on training data by using early stopping on the validation dataset.

    The function takes in the model as input, chooses the Adam optimizer, and compiles the model with it, as well as with the binary cross entropy loss and the accuracy metrics:

    def compile_and_fit(model, name, max_epochs=3000):

        optimizer = tf.keras.optimizers.Adam(lr_schedule)

        model.compile(optimizer=optimizer,\

        loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),\

        metrics=[tf.keras.losses.BinaryCrossentropy(from_logits=True,\

                 name='binary_crossentropy'),'accuracy'])

    A summary of the model is then printed, as follows:

        model.summary()

  10. The model is then fitted on the training dataset using a validation dataset and the early stopping callback. The training history is saved and returned as output:

        history = model.fit(train_ds, \

                  steps_per_epoch = STEPS_PER_EPOCH,\

                  epochs=max_epochs, validation_data=validate_ds, \

                  callbacks=[tf.keras.callbacks\

                             .EarlyStopping\

                             (monitor='val_binary_crossentropy',\

                             patience=200)],verbose=2)

        return history

  11. Create a small model with just two layers with 16 and 1 neurons, respectively, and compile it and fit it on the dataset:

    small_model = tf.keras.Sequential([\

                  tf.keras.layers.Dense(16, activation='elu',\

                                        input_shape=(N_FEATURES,)),\

                  tf.keras.layers.Dense(1)])

    size_histories = {}

    size_histories['small'] = compile_and_fit(small_model, 'sizes/small')

    This will produce a long output, where the last two lines will be similar to the following:

    Epoch 1522/3000

    20/20 - 0s - loss: 0.5693 - binary_crossentropy: 0.5693

    - accuracy: 0.6846 - val_loss: 0.5841

    - val_binary_crossentropy: 0.5841 - val_accuracy: 0.6640

    Epoch 1523/3000

    20/20 - 0s - loss: 0.5695 - binary_crossentropy: 0.5695

    - accuracy: 0.6822 - val_loss: 0.5845

    - val_binary_crossentropy: 0.5845 - val_accuracy: 0.6600

  12. Check the model's performance on the test set:

    test_accuracy = tf.keras.metrics.Accuracy()

    for (features, labels) in test_ds:

        logits = small_model(features)

        probabilities = tf.keras.activations.sigmoid(logits)

        predictions = 1*(probabilities.numpy() > 0.5)

        test_accuracy(predictions, labels)

        small_model_accuracy = test_accuracy.result()

    print("Test set accuracy:{:.3%}".format(test_accuracy.result()))

    The output will be as follows:

    Test set accuracy: 68.200%

    Note

    The accuracy may show slightly different values due to random sampling with a variable random seed.

  13. Create a large model with five layers – four with 512 neurons and the last one with 1 neuron, respectively – and compile and fit it:

    large_model = tf.keras.Sequential([\

                  tf.keras.layers.Dense(512, activation='elu',\

                                        input_shape=(N_FEATURES,)),\

                  tf.keras.layers.Dense(512, activation='elu'),\

                  tf.keras.layers.Dense(512, activation='elu'),\

                  tf.keras.layers.Dense(512, activation='elu'),\

                  tf.keras.layers.Dense(1)])

    size_histories['large'] = compile_and_fit(large_model, "sizes/large")

    This will produce a long output, where the last two lines will be similar to the following:

    Epoch 221/3000

    20/20 - 0s - loss: 1.0285e-04 - binary_crossentropy: 1.0285e-04

    - accuracy: 1.0000 - val_loss: 2.5506

    - val_binary_crossentropy: 2.5506 - val_accuracy: 0.6660

    Epoch 222/3000

    20/20 - 0s - loss: 1.0099e-04 - binary_crossentropy: 1.0099e-04

    - accuracy: 1.0000 - val_loss: 2.5586

    - val_binary_crossentropy: 2.5586 - val_accuracy: 0.6650

  14. Check the model's performance on the test set:

    test_accuracy = tf.keras.metrics.Accuracy()

    for (features, labels) in test_ds:

        logits = large_model(features)

        probabilities = tf.keras.activations.sigmoid(logits)

        predictions = 1*(probabilities.numpy() > 0.5)

        test_accuracy(predictions, labels)

        large_model_accuracy = test_accuracy.result()

        regularization_model_accuracy = test_accuracy.result()

    print("Test set accuracy: {:.3%}"\

          . format(regularization_model_accuracy))

    The output will be as follows:

    Test set accuracy: 65.200%

    Note

    The accuracy may show slightly different values due to random sampling with a variable random seed.

  15. Create the same large model as before, but add regularization items such as L2 regularization and dropout. Then, compile it and fit the model to the set:

    regularization_model = tf.keras.Sequential([\

                           tf.keras.layers.Dense(512,\

                           kernel_regularizer=tf.keras.regularizers\

                                              .l2(0.0001),\

                           activation='elu', \

                           input_shape=(N_FEATURES,)),\

                           tf.keras.layers.Dropout(0.5),\

                           tf.keras.layers.Dense(512,\

                           kernel_regularizer=tf.keras.regularizers\

                                              .l2(0.0001),\

                           activation='elu'),\

                           tf.keras.layers.Dropout(0.5),\

                           tf.keras.layers.Dense(512,\

                           kernel_regularizer=tf.keras.regularizers\

                                              .l2(0.0001),\

                           activation='elu'),\

                           tf.keras.layers.Dropout(0.5),\

                           tf.keras.layers.Dense(512,\

                           kernel_regularizer=tf.keras.regularizers\

                                              .l2(0.0001),\

                           activation='elu'),\

                           tf.keras.layers.Dropout(0.5),\

                           tf.keras.layers.Dense(1)])

    size_histories['regularization'] = compile_and_fit\

                                       (regularization_model,\

                                        "regularizers/regularization",\

                                        max_epochs=9000)

    This will produce a long output, where the last two lines will be similar to the following:

    Epoch 1264/9000

    20/20 - 0s - loss: 0.5873 - binary_crossentropy: 0.5469

    - accuracy: 0.6978 - val_loss: 0.5819

    - val_binary_crossentropy: 0.5416 - val_accuracy: 0.7030

    Epoch 1265/9000

    20/20 - 0s - loss: 0.5868 - binary_crossentropy: 0.5465

    - accuracy: 0.7024 - val_loss: 0.5759

    - val_binary_crossentropy: 0.5356 - val_accuracy: 0.7100

  16. Check the model's performance on the test set:

    test_accuracy = tf.keras.metrics.Accuracy()

    for (features, labels) in test_ds:

        logits = regularization_model (features)

        probabilities = tf.keras.activations.sigmoid(logits)

        predictions = 1*(probabilities.numpy() > 0.5)

        test_accuracy(predictions, labels)

    print("Test set accuracy: {:.3%}".format(test_accuracy.result()))

    The output will be as follows:

    Test set accuracy: 69.300%

    Note

    The accuracy may show slightly different values due to random sampling with a variable random seed.

  17. Compare the binary cross entropy trend of the three models over epochs:

    histSmall = pd.DataFrame(size_histories["small"].history)

    histSmall['epoch'] = size_histories["small"].epoch

    histLarge = pd.DataFrame(size_histories["large"].history)

    histLarge['epoch'] = size_histories["large"].epoch

    histReg = pd.DataFrame(size_histories["regularization"].history)

    histReg['epoch'] = size_histories["regularization"].epoch

    trainSmoothSmall = gaussian_filter1d\

                       (histSmall['binary_crossentropy'], sigma=3)

    testSmoothSmall = gaussian_filter1d\

                      (histSmall['val_binary_crossentropy'], sigma=3)

    trainSmoothLarge = gaussian_filter1d\

                       (histLarge['binary_crossentropy'], sigma=3)

    testSmoothLarge = gaussian_filter1d\

                      (histLarge['val_binary_crossentropy'], sigma=3)

    trainSmoothReg = gaussian_filter1d\

                     (histReg['binary_crossentropy'], sigma=3)

    testSmoothReg = gaussian_filter1d\

                    (histReg['val_binary_crossentropy'], sigma=3)

    plt.plot(histSmall['epoch'], trainSmoothSmall, '-', \

             histSmall['epoch'], testSmoothSmall, '--')

    plt.plot(histLarge['epoch'], trainSmoothLarge, '-', \

             histLarge['epoch'], testSmoothLarge, '--')

    plt.plot(histReg['epoch'], trainSmoothReg, '-', \

             histReg['epoch'], testSmoothReg, '--',)

    plt.ylim([0.5, 0.7])

    plt.ylabel('Binary Crossentropy')

    plt.legend(["Small Training", "Small Validation", \

                "Large Training", "Large Validation", \

                "Regularization Training", \

                "Regularization Validation"])

    This will produce the following graph:

    Figure 3.22: Binary cross entropy comparison

    The preceding graph shows a comparison of the different models, in terms of both training and validation errors, to demonstrate how overfitting works. The training error goes down for each of them as the number of training epochs increases. The validation error for the large model, on the other hand, rapidly increases after a certain number of epochs. In the small model, it goes down, following the training error closely and reaching a final performance that is worse than the one obtained by the model with regularization, which avoids overfitting and has the best performance among the three.

  18. Compare the accuracy trend of the three models over epochs:

    trainSmoothSmall = gaussian_filter1d\

                       (histSmall['accuracy'], sigma=6)

    testSmoothSmall = gaussian_filter1d\

                      (histSmall['val_accuracy'], sigma=6)

    trainSmoothLarge = gaussian_filter1d\

                       (histLarge['accuracy'], sigma=6)

    testSmoothLarge = gaussian_filter1d\

                      (histLarge['val_accuracy'], sigma=6)

    trainSmoothReg = gaussian_filter1d\

                     (histReg['accuracy'], sigma=6)

    testSmoothReg = gaussian_filter1d\

                    (histReg['val_accuracy'], sigma=6)

    plt.plot(histSmall['epoch'], trainSmoothSmall, '-', \

             histSmall['epoch'], testSmoothSmall, '--')

    plt.plot(histLarge['epoch'], trainSmoothLarge, '-', \

             histLarge['epoch'], testSmoothLarge, '--')

    plt.plot(histReg['epoch'], trainSmoothReg, '-', \

             histReg['epoch'], testSmoothReg, '--',)

    plt.ylim([0.5, 0.75])

    plt.ylabel('Accuracy')

    plt.legend(["Small Training", "Small Validation", \

                "Large Training", "Large Validation",\

                "Regularization Training", \

                "Regularization Validation",])

    This will produce the following graph:

Figure 3.23: Accuracy comparison

In a specular way with respect to the previous one, this graph shows, once again, a comparison of the different models, but in terms of accuracy. The training accuracy grows for each model when the number of training epochs increases. The validation accuracy for the large model, on the other hand, stops growing after a certain number of epochs. In the small model, it goes up, following the training one closely and reaching a final performance that is worse than the one obtained by the model with regularization, which avoids overfitting and attains the best performance among the three.

Note

To access the source code for this specific section, please refer to https://packt.live/37m9huu.

You can also run this example online at https://packt.live/3hhIDaZ.

In this section, we solved a fancy classification problem, resulting in the creation of a deep learning model able to achieve about 70% accuracy when classifying Higgs boson-related signals using simulated ATLAS experiment data. After a first general overview of the dataset, where we understood how it is arranged and the nature of its features and labels, a set of three deep fully connected neural networks were created using the Keras API. These models were trained and tested, and their performances in terms of loss and accuracy over epochs have been compared, thereby giving us a firm grasp of the overfitting problem and which techniques help in solving it.