Decision Tree for Classification
Now, two question arises. Q1 : How do we know that we have get Purity / Pure split? There are 2 methods to find it out : 1) Entropy (Good) 2) Gini Impurity (Better because it's faster) Q2 : How does the features are selected? Information Gain 1) Entropy : H ( S ) = − ∑ i = 1 c p i lo g 2 ( p i ) Where: 𝑐 c is the number of classes in the output. Example : As you can see in the above diagram, for the component 'Overcast', we have 4 yes, 0 No. Total =4 Now, let's calculate Entropy for this Component. H ( S ) = − p yes lo g 2 ( p yes ) − p no lo g 2 ( p no ) H ( S ) = − 4 4 lo g 2 ( 4 4 ) − 4 0 lo g 2 ( 4 0 ) = -1(0) - 0 ...