Statistical/machine learning models

The previous section introduced a host of problems through real datasets, and we will now discuss some standard model variants that are useful for dealing with such problems. First, we set up the required mathematical framework.

Suppose that we have n independent pairs of observations, Statistical/machine learning models, where Statistical/machine learning models denotes the random variable of interest, also known as the dependent variable, regress and, endogenous variable, and so on. Statistical/machine learning models is the associated vector of explanatory variables, or independent/exogenous variables. The explanatory vector will consist of k elements, that is, Statistical/machine learning models. The data realized is of the form Statistical/machine learning models, where Statistical/machine learning models is the realized value (data) of random variable Statistical/machine learning models. A convention will be adapted throughout the book that Statistical/machine learning models, and this will take care of the intercept term. We assume that the observations are from the true distribution F, which is not completely known. The general regression model, including the classification model as well as the regression model, is specified by:

Statistical/machine learning models

Here, the function f is an unknown function and Statistical/machine learning models is the regression parameter, which captures the influence of Statistical/machine learning models on Statistical/machine learning models. The error Statistical/machine learning models is the associated unobservable error term. Diverse methods can be applied to model the relationship between the Ys and the xes. The statistical regression model focused on the complete specification of the error distribution Statistical/machine learning models, and in general the functional form would be linear as in Statistical/machine learning models. The function Statistical/machine learning models is the link function in the class of generalized linear models. Nonparametric and semiparametric regression models are more flexible, as we don't place a restriction on the error's probability distribution. Flexibility would come with a price though, and here we need a much higher number of observations to make a valid inference, although that number is unspecified and is often subjective.

The machine learning paradigm includes some black box methods, and we have a healthy overlap between this paradigm and non- and semi-parametric models. The reader is also cautioned that black box does not mean unscientific in any sense. The methods have a firm mathematical foundation and are reproducible every time. Next, we quickly review some of the most important statistical and machine learning models, and illustrate them through the datasets discussed earlier.