Chi-square and McNemar test

We had five models for the hypothyroid test. We then calculated the accuracy and were satisfied with the numbers. Let's first look at the number of errors that the fitted model makes. We have 636 observations in the test partition and 42 of them test positive for the hypothyroid problem. Note that if we mark all the patients as negative, we would be getting an accuracy of 1-42/636 = 0.934, or about 93.4%. Using the table function, we pit the actuals against the predicted values and see how often the fitted model goes wrong. We remark here that identifying the hypothyroid cases as the same and the negative cases as negative is the correct prediction, while marking the hypothyroid case as negative and vice versa leads to errors. For each model, we look at the misclassification errors:

> table(LR_Predict_Bin,testY_numeric)
              testY_numeric
LR_Predict_Bin   1   2
             1  32   7
             2  10 587
> table(NN_Predict,HT2_TestY)
             HT2_TestY
NN_Predict    hypothyroid negative
  hypothyroid          41       22
  negative              1      572
> table(NB_predict,HT2_TestY)
             HT2_TestY
NB_predict    hypothyroid negative
  hypothyroid          33        8
  negative              9      586
> table(CT_predict,HT2_TestY)
             HT2_TestY
CT_predict    hypothyroid negative
  hypothyroid          38        4
  negative              4      590
> table(SVM_predict,HT2_TestY)
             HT2_TestY
SVM_predict   hypothyroid negative
  hypothyroid          34        2
  negative              8      592

From the misclassification table, we can see that the neural network identifies 41 out of the 42 cases of hypothyroid correctly, but it identifies way more cases of hypothyroid incorrectly too. The question that arises is whether the correct predictions of the fitted models only occur by chance, or whether they depend on truth and can be explained. To test this, in the hypotheses framework we would like to test whether the actuals and predicted values of the actuals are independent of or dependent on each other. Technically, the null hypothesis is that the prediction is independent of the actual, and if a model explains the truth, the null hypothesis must be rejected. We should conclude that the fitted model predictions depend on the truth. We deploy two solutions here, the chi-square test and the McNemar test:

> chisq.test(table(LR_Predict_Bin,testY_numeric))
Pearson's Chi-squared test with Yates' continuity correction
data:  table(LR_Predict_Bin, testY_numeric)
X-squared = 370.53501, df = 1, p-value < 0.00000000000000022204
> chisq.test(table(NN_Predict,HT2_TestY))
Pearson's Chi-squared test with Yates' continuity correction
data:  table(NN_Predict, HT2_TestY)
X-squared = 377.22569, df = 1, p-value < 0.00000000000000022204
> chisq.test(table(NB_predict,HT2_TestY))
Pearson's Chi-squared test with Yates' continuity correction
data:  table(NB_predict, HT2_TestY)
X-squared = 375.18659, df = 1, p-value < 0.00000000000000022204
> chisq.test(table(CT_predict,HT2_TestY))
Pearson's Chi-squared test with Yates' continuity correction
data:  table(CT_predict, HT2_TestY)
X-squared = 498.44791, df = 1, p-value < 0.00000000000000022204
> chisq.test(table(SVM_predict,HT2_TestY))
Pearson's Chi-squared test with Yates' continuity correction
data:  table(SVM_predict, HT2_TestY)
X-squared = 462.41803, df = 1, p-value < 0.00000000000000022204
> mcnemar.test(table(LR_Predict_Bin,testY_numeric))
McNemar's Chi-squared test with continuity correction
data:  table(LR_Predict_Bin, testY_numeric)
McNemar's chi-squared = 0.23529412, df = 1, p-value = 0.6276258
> mcnemar.test(table(NN_Predict,HT2_TestY))
McNemar's Chi-squared test with continuity correction
data:  table(NN_Predict, HT2_TestY)
McNemar's chi-squared = 17.391304, df = 1, p-value = 0.00003042146
> mcnemar.test(table(NB_predict,HT2_TestY))
McNemar's Chi-squared test with continuity correction
data:  table(NB_predict, HT2_TestY)
McNemar's chi-squared = 0, df = 1, p-value = 1
> mcnemar.test(table(CT_predict,HT2_TestY))
McNemar's Chi-squared test
data:  table(CT_predict, HT2_TestY)
McNemar's chi-squared = 0, df = 1, p-value = 1
> mcnemar.test(table(SVM_predict,HT2_TestY))
McNemar's Chi-squared test with continuity correction
data:  table(SVM_predict, HT2_TestY)
McNemar's chi-squared = 2.5, df = 1, p-value = 0.1138463

The answer provided by the chi-square tests clearly shows that the predictions of each fitted model is not down to chance. It also shows that the prediction of hypothyroid cases, as well as the negative cases, is expected of the fitted models. The interpretation of and conclusions from the McNemar's test is left to the reader. The final important measure in classification problems is the ROC curve, which is considered next.