Decision Tree for Classification

 








Now, two question arises.

Q1 : How do we know that we have get Purity / Pure split?

            There are 2 methods to find it out :
                1) Entropy (Good)
                2) Gini Impurity (Better because it's faster)

Q2 : How does the features are selected?
            Information Gain

1) Entropy :

Where:

  • 𝑐 is the number of classes in the output.
Example :
As you can see in the above diagram, for the component 'Overcast', we have 4 yes, 0 No.
Total =4
Now, let's calculate Entropy for this Component.

H(S)=pyeslog2(pyes)pnolog2(pno)


H(S)=44log2(44)40log2(40)

            = -1(0) - 0                      (∵ log 1 = 0)
          = 0
Therefore, we can say that Entropy is always 0 for pure node.
Suppose, for some random component, we have 3 yes, 3 no.

Simplifying:

𝐻(𝑆)=12log2(12)12log2(12)

=12×112×1

=12+12

=1

So, the entropy of this dataset is 1.

Entropy will always between 0 to 1.


2) Gini Impurity :


GI = 1i=1cpi2

  1. For 4 instances of "yes" and 0 instances of "no":

    Here, 𝑝yes=44=1, and Pno = 0/4 = 0

    Plugging this into the Gini Impurity formula:

    𝐼𝐺(𝑝)=1(12)(11)2 =110 =0

    So, the Gini Impurity for this scenario is 0.

  2. For 3 instances of "yes" and 3 instances of "no":

    Here, 𝑝yes=36=12, because there are equal instances of both classes.

    Plugging this into the Gini Impurity formula:

    𝐼𝐺(𝑝)=1((36)2+(36)2) =1(936+936) =1(1836) =1(12) =12

    So, the Gini Impurity for this scenario is 0.5

    12

    The Gini Impurity ranges from 0 to 0.5 for binary classification problems.





    Entropy takes more computation power to calculate log.
     So, Gini Impurity is faster than Entropy.

3) Information Gain :


IG(S,f1)=H(f1)vValues(f1)SSvH(Sv)

In the diagram, for feature Outlook, we have 8Yes and 6No. Total =14
Sunny - 2Yes | 3 No Overcast - 4Yes Rainy - 2Yes | 3 No
Let's calculate information gain for it,

IG(S,Outlook)=H(S)(145×H(Ssunny)+144×H(Sovercast)+145×H(Srainy))

Now, assume that for two features f1 and f2,
IG(S,f1) = 0.049 and IG(s,f2) = 0.051

Use the feature with highest gain, so Go for f2.

Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Getting Started with ML

Linear Regression