Modeling and evaluation

In the neuralnet package, the function that we will use is appropriately named neuralnet(). Other than the formula, there are four other critical arguments that we will need to examine:

  • hidden: This is the number of hidden neurons in each layer, which can be up to three layers; the default is 1
  • act.fct: This is the activation function with the default logistic and tanh available
  • err.fct: This is the function used to calculate the error with the default sse; as we are dealing with binary outcomes, we will use ce for cross-entropy
  • linear.output: This is a logical argument on whether or not to ignore act.fct with the default TRUE, so for our data, this will need to be FALSE

You can also specify the algorithm. The default is resilient with backpropagation and we will use it along with the default of one hidden neuron for simplicity:

> nnfit <- neuralnet::neuralnet(form, data = train_treated, err.fct = "ce", linear.output = FALSE)

Here is an abbreviated output of weights for the overall result:

> head(nnfit$result.matrix)                                                1
error 0.024293436369
reached.threshold 0.009929147409
steps 181.000000000000
Intercept.to.1layhid1 0.573783967352
stability_lev_x_stab.to.1layhid1 -2.072585716776
stability_lev_x_xstab.to.1layhid1 6.859369770672

We can see that the error is extremely low at 0.024. The number of steps required for the algorithm to reach the threshold, which is when the absolute partial derivatives of the error function, become smaller than this error (default = 0.1).

You can also look at what is known as generalized weights. According to the authors of the neuralnet package, the generalized weight is defined as the contribution of the ith covariate to the log-odds:

The generalized weight expresses the effect of each covariate xi and thus has an analogous interpretation as the ith regression parameter in regression models. However, the generalized weight depends on all other covariates (Gunther and Fritsch, 2010).

The weights can be called and examined. I've abbreviated the output to the first four variables and six observations only. Note that if you sum each row, you will get the same number, which means that the weights are equal for each covariate combination. Please note that your results might be slightly different because of random weight initialization.

The results are as follows:

> head(fit$generalized.weights[[1]])
[,1] [,2] [,3] [,4]
1 0.0004057906237 -0.001342992917 -0.0010654093452 -0.00010947079069
2 0.0003792401307 -0.001255122173 -0.0009957006291 -0.00010230822138
3 0.0003929874040 -0.001300619751 -0.0010317943007 -0.00010601684547
4 0.0003672745975 -0.001215521390 -0.0009642849428 -0.00009908026019
5 0.0273129186450 -0.090394045943 -0.0717104759663 -0.00736825009054
6 0.0255281981170 -0.084487386479 -0.0670246655557 -0.00688678315678

To visualize the neural network, simply use the plot() function:

> plot(fit)

The following is the output of the preceding command:

This plot shows the weights of the features and intercepts.

We now want to see how well the model performs. This is done with the compute() function and specifying the fit model and covariates:

> test_pred <- neuralnet::compute(nnfit, test_treated[, 1:16])

> test_prob <- test_pred$net.result

These results are in probabilities, so let's turn them into 0 or 1 and follow this up with a confusion matrix and log-loss:

> pred <- ifelse(test_prob >= 0.5, 1, 0) 

> table(pred, test_treated$y)

pred 0 1
0 41 0
1 3 58
> MLmetrics::LogLoss(test_prob, test_treated$y)
[1] 0.2002453861

The model achieved near-perfect accuracy on the test set but had three false negatives. I'll leave it to you to see if you can build a neural network that achieves 100% accuracy!