Decision Tree for Classification
Now, two question arises.
Q1 : How do we know that we have get Purity / Pure split?
There are 2 methods to find it out :
1) Entropy (Good)
2) Gini Impurity (Better because it's faster)
Q2 : How does the features are selected?
Information Gain
1) Entropy :
Where:
- is the number of classes in the output.
Example :
As you can see in the above diagram, for the component 'Overcast', we have 4 yes, 0 No.
Total =4
Now, let's calculate Entropy for this Component.
H(S)=−pyeslog2(pyes)−pnolog2(pno)
H(S)=−44log2(44)−40log2(40)
= -1(0) - 0 (∵ log 1 = 0)
= 0
Therefore, we can say that Entropy is always 0 for pure node.
Suppose, for some random component, we have 3 yes, 3 no.
Simplifying:
So, the entropy of this dataset is 1.
Entropy will always between 0 to 1.
2) Gini Impurity :
GI = 1−∑i=1cpi2
For 4 instances of "yes" and 0 instances of "no":
Here, , and Pno = 0/4 = 0
Plugging this into the Gini Impurity formula:
So, the Gini Impurity for this scenario is 0.
For 3 instances of "yes" and 3 instances of "no":
Here, , because there are equal instances of both classes.
Plugging this into the Gini Impurity formula:
So, the Gini Impurity for this scenario is 0.5The Gini Impurity ranges from 0 to 0.5 for binary classification problems.Entropy takes more computation power to calculate log.
So, Gini Impurity is faster than Entropy.
3) Information Gain :
IG(S,f1)=H(f1)−∑v∈Values(f1)∣S∣∣Sv∣H(Sv)
In the diagram, for feature Outlook, we have 8Yes and 6No. Total =14
Sunny - 2Yes | 3 No
Overcast - 4Yes
Rainy - 2Yes | 3 No
Let's calculate information gain for it,
IG(S,Outlook)=H(S)−(145×H(Ssunny)+144×H(Sovercast)+145×H(Srainy))
Now, assume that for two features f1 and f2,
IG(S,f1) = 0.049 and IG(s,f2) = 0.051
Use the feature with highest gain, so Go for f2.
Comments
Post a Comment