Decision Trees
Objective
Recursively split the dataset until the tree is left with pure leaf nodes i.e. only one class in a node.
Decision trees are greedy and are supervised.
ID3 Algorithm
- Calculate the entropy.
- Split the data.
- Create a node.
- Repeat recursively.
There are two types of nodes: decision nodes and leaf nodes.
Note
When arriving at non-pure leaf nodes e.g. likely when data becomes complex, classify based on the majority.
Entropy
Measures the amount of uncertainty or impurity in the dataset.
- - Probability of Class
Information Gain
Measures the reduction in entropy or Gini impurity after a dataset is split on an attribute.
- e.g.
Gini Impurity Index
- Todo