Training the model

Believe it or not, OpenCV does not offer any good implementation of linear regression. Some people online say that you can use cv2.fitLine, but that is different. This is a perfect opportunity to get familiar with scikit-learn's API:

In [6]: linreg = linear_model.LinearRegression()

In the preceding command, we want to split the data into training and test sets. We are free to make the split as we see fit, but usually it is a good idea to reserve between 10 percent and 30 percent for testing. Here, we choose 10 percent, using the test_size argument:

In [7]: X_train, X_test, y_train, y_test = modsel.train_test_split(
... boston.data,
boston.target, test_size=0.1,
... random_state=42
... )

In scikit-learn, the train function is called fit, but otherwise behaves exactly the same as in OpenCV:

In [8]: linreg.fit(X_train, y_train)
Out[8]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1,
normalize=False)

We can look at the mean squared error of our predictions by comparing the true housing prices, y_train, to our predictions, linreg.predict(X_train):

In [9]: metrics.mean_squared_error(y_train, linreg.predict(X_train))
Out[9]: 22.739484154236614

The score method of the linreg object returns the coefficient of determination (R squared):

In [10]: linreg.score(X_train, y_train)
Out[10]: 0.73749340919011974