Decision Tree for Classification

- May 14, 2024

Now, two question arises.

Q1 : How do we know that we have get Purity / Pure split?

There are 2 methods to find it out :

1) Entropy (Good)

2) Gini Impurity (Better because it's faster)

Q2 : How does the features are selected?

Information Gain

1) Entropy :

$H (S) = - \sum_{i = 1}^{c} p_{i} lo g_{2} (p_{i})$

Where:

$𝑐$ is the number of classes in the output.

Example :

As you can see in the above diagram, for the component 'Overcast', we have 4 yes, 0 No.

Total =4

Now, let's calculate Entropy for this Component.

H(S)=−pyeslog2(pyes)−pnolog2(pno)

H(S)=−44log2(44)−40log2(40)

= -1(0) - 0 (∵ log 1 = 0)

= 0

Therefore, we can say that Entropy is always 0 for pure node.

Suppose, for some random component, we have 3 yes, 3 no.

$H (S) = - \frac{3}{6} lo g_{2} (\frac{3}{6}) - \frac{3}{6} lo g_{2} (\frac{3}{6})$

Simplifying:

$𝐻 (𝑆) = - \frac{1}{2} \log_{2} (\frac{1}{2}) - \frac{1}{2} \log_{2} (\frac{1}{2})$

$= - \frac{1}{2} \times - 1 - \frac{1}{2} \times - 1$

$= \frac{1}{2} + \frac{1}{2}$

$= 1$

So, the entropy of this dataset is 1.

Entropy will always between 0 to 1.

2) Gini Impurity :

GI = 1−∑i=1cpi2

For 4 instances of "yes" and 0 instances of "no":
Here, $𝑝_{yes} = \frac{4}{4} = 1$ , and Pno = 0/4 = 0
Plugging this into the Gini Impurity formula:
$𝐼_{𝐺} (𝑝) = 1 - (1^{2}) - (1 - 1)^{2}$ $= 1 - 1 - 0$ $= 0$
So, the Gini Impurity for this scenario is 0.
For 3 instances of "yes" and 3 instances of "no":
Here, $𝑝_{yes} = \frac{3}{6} = \frac{1}{2}$ , because there are equal instances of both classes.
Plugging this into the Gini Impurity formula:
$𝐼_{𝐺} (𝑝) = 1 - ({(\frac{3}{6})}^{2} + {(\frac{3}{6})}^{2})$ $= 1 - (\frac{9}{36} + \frac{9}{36})$ $= 1 - (\frac{18}{36})$ $= 1 - (\frac{1}{2})$ $= \frac{1}{2}$
So, the Gini Impurity for this scenario is 0.5
$\frac{1}{2}$
The Gini Impurity ranges from 0 to 0.5 for binary classification problems.

Entropy takes more computation power to calculate log.

So, Gini Impurity is faster than Entropy.

3) Information Gain :

IG(S,f1)=H(f1)−∑v∈Values(f1)∣S∣∣Sv∣H(Sv)

In the diagram, for feature Outlook, we have 8Yes and 6No. Total =14

Sunny - 2Yes | 3 No Overcast - 4Yes Rainy - 2Yes | 3 No

Let's calculate information gain for it,

IG(S,Outlook)=H(S)−(145×H(Ssunny)+144×H(Sovercast)+145×H(Srainy))

Now, assume that for two features f1 and f2,

IG(S,f1) = 0.049 and IG(s,f2) = 0.051

Use the feature with highest gain, so Go for f2.

Search This Blog

Machine Learning