Gradient boosting

Boosting methods can become extremely complicated to learn and understand, but you should keep in mind what's fundamentally happening behind the curtain. The main idea is to build an initial model of some kind (linear, spline, tree, and so on) called the base learner, examine the residuals, and fit a model based on these residuals around the so-called loss function. A loss function is merely the function that measures the discrepancy between the model and desired prediction, for example, a squared error for regression or the logistic function for classification. The process continues until it reaches some specified stopping criterion. This is sort of like the student who takes a practice exam and gets 30 out of 100 questions wrong and, as a result, studies only these 30 questions that were missed. In the next practice exam, they get 10 out of those 30 wrong and so only focus on those 10 questions, and so on. If you would like to explore the theory behind this further, a great resource for you is available in Frontiers in Neurorobotics, Gradient boosting machines, a tutorial, Natekin A., Knoll A. (2013), at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3885826/.

As just mentioned, boosting can be applied to many different base learners, but here, we'll only focus on the specifics of tree-based learning. Each tree iteration is small and we'll determine how small with one of the tuning parameters referred to as interaction depth. In fact, it may be as small as one split, which is referred to as a stump.

Trees are sequentially fitted to the residuals, according to the loss function, up to the number of trees that we specified (our stopping criterion).

There're a number of parameters that require tuning in the model-building process using the Xgboost package, which stands for eXtreme Gradient Boosting. This package has become quite popular for online data contests because of its winning performance. There's excellent background material on boosting trees and on Xgboost at the following website:

http://xgboost.readthedocs.io/en/latest/model.html.

In the practical examples, we'll learn how to begin to optimize the hyperparameters and produce meaningful output and predictions. These parameters can interact with each other and, if you just tinker with one without considering the other, your model may worsen the performance. The caret package will help us in the tuning endeavor.