Deep dive into supervised learning algorithms

Assume there are predictor attributes, x1, x2, .... xn, and also an objective attribute, y, for a given dataset. Then, the supervised learning is the machine learning task of finding the prediction function that takes as input both the predictor attributes and the objective attribute from this dataset, and is capable of mapping the predictive attributes to the objective attribute for even unseen data currently not in the training dataset with minimal error.

The data in the dataset used for arriving at the prediction function is called the training data and it consists of a set of training examples where each example consists of an input object, x (typically a vector), and a desired output value, Y. A supervised learning algorithm analyzes the training data and produces an inferred function that maps the input to output and could also be used for mapping new, unseen example data:

Y = f(X) + error

The whole category of algorithms is called supervised learning, because here we consider both input and output variables for learning. So learning is supervised algorithm is by providing the input as well as the expected output in the training data for all the instances of training data.

The supervised algorithms have both predictor attributes and an objective function. The predictor attributes in a set of data items are those items that are considered to predict the objective function. The objective function is the goal of machine learning. This usually takes in the predictor attributes, perhaps with some other compute functionality, and would usually output a single numeric value.

Once we have defined a proper machine learning problem that would require supervised learning, the next step is to choose the machine learning algorithm that would solve the problem. This is the toughest task, because there is a huge list of learning algorithms present, and selecting the most suitable from among them is a nightmare. 

Professor Pedro Domingos has provided a simple reference architecture (https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf), on which basis we could perform the algorithm selection using on three critical components that would be required for any machine learning algorithm, as follows:

  • Representation: The way the model is represented so that it can be understood by the computer. It can also be considered as the hypothesis space within which the model would act.
  • Evaluation: For each algorithm or model, there needs to be an evaluation or scoring function to determine which one performs better. The scoring function would be different for each type of algorithm.
  • Optimization: A method to search among the models in the language for the highest-scoring one. The choice of optimization technique is integral to the efficiency of the learner, and also helps determine the model produced if the evaluation function has more than one optimum.

Supervised learning problems can be further grouped into regression and classification problems:

  • Classification: When the output variable is a category, such as green or red, or good or bad.
  • Regression: When the output variable is a real value, such as dollars or weight.

In this section, we will go through the following supervised learning algorithms with easy-to-understand examples:

  • Naive Bayes
  • Decision trees
  • Linear regression
  • Logistic regression
  • Support vector machines
  • Random forest