Decision Tree for Classification

 








Now, two question arises.

Q1 : How do we know that we have get Purity / Pure split?

            There are 2 methods to find it out :
                1) Entropy (Good)
                2) Gini Impurity (Better because it's faster)

Q2 : How does the features are selected?
            Information Gain

1) Entropy :

Where:

  • 𝑐 is the number of classes in the output.
Example :
As you can see in the above diagram, for the component 'Overcast', we have 4 yes, 0 No.
Total =4
Now, let's calculate Entropy for this Component.

H(S)=pyeslog2(pyes)pnolog2(pno)


H(S)=44log2(44)40log2(40)

            = -1(0) - 0                      (∵ log 1 = 0)
          = 0
Therefore, we can say that Entropy is always 0 for pure node.
Suppose, for some random component, we have 3 yes, 3 no.

Simplifying:

𝐻(𝑆)=12log2(12)12log2(12)

=12×112×1

=12+12

=1

So, the entropy of this dataset is 1.

Entropy will always between 0 to 1.


2) Gini Impurity :


GI = 1i=1cpi2

  1. For 4 instances of "yes" and 0 instances of "no":

    Here, 𝑝yes=44=1, and Pno = 0/4 = 0

    Plugging this into the Gini Impurity formula:

    𝐼𝐺(𝑝)=1(12)(11)2 =110 =0

    So, the Gini Impurity for this scenario is 0.

  2. For 3 instances of "yes" and 3 instances of "no":

    Here, 𝑝yes=36=12, because there are equal instances of both classes.

    Plugging this into the Gini Impurity formula:

    𝐼𝐺(𝑝)=1((36)2+(36)2) =1(936+936) =1(1836) =1(12) =12

    So, the Gini Impurity for this scenario is 0.5

    12

    The Gini Impurity ranges from 0 to 0.5 for binary classification problems.





    Entropy takes more computation power to calculate log.
     So, Gini Impurity is faster than Entropy.

3) Information Gain :


IG(S,f1)=H(f1)vValues(f1)SSvH(Sv)

In the diagram, for feature Outlook, we have 8Yes and 6No. Total =14
Sunny - 2Yes | 3 No Overcast - 4Yes Rainy - 2Yes | 3 No
Let's calculate information gain for it,

IG(S,Outlook)=H(S)(145×H(Ssunny)+144×H(Sovercast)+145×H(Srainy))

Now, assume that for two features f1 and f2,
IG(S,f1) = 0.049 and IG(s,f2) = 0.051

Use the feature with highest gain, so Go for f2.

Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Positional Encoding in Transformer