Advantages of the decision tree algorithm

The goal of the decision tree is to arrive at the optimal choice for the given problem. The final leaf node should be the best choice for the problem at hand. The algorithm behaves greedily and tries to come to the optimal choice in each decision it takes.

The whole problem is divided into multiple sub-problems, with each sub-problem branching out to other sub-problems. The subsets arrived are based on a parameter called purity. A node is said to be 100% pure when all decisions will lead to data belonging to the same class. It will be 100% impure when there is a possibility of splitting its subsets into categories. The goal of the algorithm is to reach 100% purity for each node in the tree.

The purity of a node is measured using Gini impurity, and Gini impurity is a standard metric that helps in splitting the node of a decision tree.

The other metric that would be used in a decision tree is information gain, which will be used to decide what feature of the dataset should be used to split at each step in the tree. The information gain is the decrease in entropy (randomness) after a dataset is split on an attribute. Constructing a decision tree is all about finding attributes that return the highest information gain, that is, the most homogeneous branches, which means all data belonging to the same subset or class.