Naive Bayes Algorithm





It is used to solve classification problems.

This Algorithm is based on Bayes theorem, but before understanding bayes theorem, let's revise some probability concepts:

  •  Independent Event :

ex, rolling a dice

Prob. of 1 on top : 1/6

Prob. of 2 on top : 1/6

Prob. of 5 on top : 1/6

Here, all events are independent from each other.

  • Dependent Event :

ex, A box has 3 red balls and 2 blue balls.

Probability of taking a Red ball : 3/5

Probability of taking a Blue ball : 2/4

As one ball is removed, for the next event, we reduces the total no. of balls by 1. Since, Dependent Events.

P(A and B) = P(A)P(B|A)

We know that,

P(A and B) = P(B and A)

P(A)P(B|A) = P(B)P(A|B)

Baye's Theorem : 

P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)}
A, B=events
P(A|B)=probability of A given B 
P(B|A)=probability of B given A 
P(A), P(B)=the independent probabilities of A and B

For,

Dependemt Features : X1,X2,......Xn

Independent Feature : y










here, base is same for P(yes/Xi) and P(No/Xi) and its constant, so we can ignore it

These probabilities involves multiplying probabilities of individual features. In such cases, the probabilities 𝑃(Yes𝑋𝑖) and 𝑃(No𝑋𝑖) may not sum up to 1 directly because they are computed independently.

So, we need to normalize it,












Example :-


Qs : If the weather is Sunny & Hot, will he play tennis or not?

Here, as our o/p column has values yes and no, it is a binary classification problem.

Basically we need to find P(Yes | sunny,Hot) and P(No | sunny,Hot)

P(Yes | sunny,Hot) = P(Yes) P(Sunny | Yes)P(Hot | Yes)

P(No | sunny,Hot) = P(No) P(Sunny | No)P(Hot | No)

Total No. of Yes in PlayTennis = 9
Total No. of No in PlayTennis = 5
Total : 9+5 = 14
P(Yes) = 9/14    P(No) = 5/14


P(Sunny | Yes) = 2/9
P(Sunny | No)=3/5


P(Hot | Yes) = 2/9
P(Hot | No) = 2/5


So, now put these values in our equation :

P(Yes | sunny,Hot) = P(Yes) P(Sunny | Yes)P(Hot | Yes)

                               = (9/14)(2/9)(2/9)

                               = 0.0317

P(No | sunny,Hot) = P(No) P(Sunny | No)P(Hot | No)

                              = (5/14)(3/5)(2/5)    
                              = 0.0857

Now, Let's normalize this :

P(Yes | sunny,Hot) = 0.0317 / (0.0317 + 0.0857)
                               = 0.27

P(No | sunny,Hot) = 0.0857 / (0.0317 + 0.0857)
                              = 0.73

Here, 
P(No | sunny,Hot) > P(Yes | sunny,Hot) 
So, He will not play Tennis if it's Sunny & Hot.

Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Positional Encoding in Transformer

Chain Component in LangChain