Cross validation with scikit-learn API

The advantage of cross validation over repeated random sub-sampling is that all of the observations are used for both training and validation, and each observation is used for validation exactly once.

The following code shows you how to implement a five-fold cross validation in Keras, where we use the entire dataset (training and testing together) and print out the averaged predictions of a network on each of the cross validation runs. As we can see, this is achieved by training the model on four random splits and testing it on the remaining split, per each cross validation run. We use the scikit-learn API wrapper provided by Keras and leverage the Keras regressor, along with sklearn's standard scaler, k-fold cross-validator creator, and score evaluator:

import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor


from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

from keras.datasets import boston_housing
(x_train,y_train),(x_test,y_test) = boston_housing.load_data()

x_train.shape, x_test.shape

---------------------------------------------------------------
((404, 13), (102, 13))
----------------------------------------------------------------

import numpy as np

x_train = np.concatenate((x_train,x_test), axis=0)
y_train = np.concatenate((y_train,y_test), axis=0)

x_train.shape, y_train.shape

-----------------------------------------------------------------
((506, 13), (506,))
-----------------------------------------------------------------

You will notice that we constructed a function named baseline_model() to build our network. This is a useful way of constructing networks in many scenarios, but here it helps us feed the model object to the KerasRegressor function that we are using from the scikit-learn API wrapper that Keras provides. As many of you may well be aware, scikit-learn has been the go-to Python library for ML, with all sorts of pre-processing, scaling, normalizing, and algorithmic implementations. The Keras creators have implemented a scikit-learn wrapper to enable a certain degree of cross functionality between these libraries:

def baseline_model():
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal',
activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer='adam')
return model

We will take advantage of this cross functionality to perform our k-fold cross validation, as we did previously. Firstly, we will initialize a random number generator with a constant random seed. This simply gives us consistency in initializing our model weights, helping us to ensure that we can compare future models consistently:

#set seed for reproducability 
seed = 7
numpy.random.seed(seed)

# Add a data Scaler and the keras regressor containing our model function to a list of estimators

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model,
epochs=100, batch_size=5, verbose=0)))

#add our estimator list to a Sklearn pipeline

pipeline = Pipeline(estimators)

#initialize instance of k-fold validation from sklearn api

kfold = KFold(n_splits=5, random_state=seed)

#pass pipeline instance, training data and labels, and k-fold crossvalidator instance to evaluate score

results = cross_val_score(pipeline, x_train, y_train, cv=kfold)

#The results variable contains the mean squared errors for each of our
5 cross validation runs.
print("Average MSE of all 5 runs: %.2f, with standard dev: (%.2f)" %
(-1*(results.mean()), results.std()))


------------------------------------------------------------------
Model Type: <function larger_model at 0x000001454959CB70>
MSE per fold:
[-11.07775911 -12.70752338 -17.85225084 -14.55760158 -17.3656806 ]
Average MSE of all 5 runs: 14.71, with standard dev: (2.61)

We will create a list of estimators to pass to the sklearn transformation pipeline, which is useful to scale and process our data in sequence. To scale our values this time, we simply use the StandardScaler() preprocessing function from sklearn and append it to our list. We also append the Keras wrapper object to the same list. This Keras wrapper object is actually a regression estimator called KerasRegressor, and takes the model function we created, along with the desired number of batch size and training epochs as arguments. Verbose simply means how much feedback you want to see during the training process. By setting it to 0, we ask our model to train silently.

Note that these are the same parameters that you would otherwise pass along to the .fit() function of the model, as we did earlier to initiate our training sessions.

Running the preceding code gives us an estimate of the average performance of our network for the five cross-validation runs we executed. The results variable stores the MSE scores of our network for each run of the cross validator. We then print out the mean and standard deviation (average variance) of MSEs over all five runs. Notice that we multiplied our mean value by -1. This is simply an implementational issue, as the unified scoring API of scikit-learn always maximizes a given score. However, in our case, we are trying to minimize our MSE. Hence, scores that need to be minimized are negated so that the unified scoring API can work correctly. The score that is returned is the negative version of the actual MSE.