KNN Algorithm

KNN stands for K-Nearest Neighbour

It is used to solve both Classification & Regression problems.

1. KNN FOR CLASSIFICATION



Here, we have one group of Green points and one group of Red points.

Now, for yellow data point, we want to know whether it will belong to Red or Green group.

Let's say, k=3 

We will measure the distance and get 3 nearest data points from yellow.

As you can see in the figure, 3 nearest data points will include 2 Green and 1 Red. 

Hence, we will put it in Green category.

We measure the distance by 2 methods :

1) Eucledian Distance :

2) Manhanttan Distance :


Manhattan Distance = | x 1 − x 2 | + | y 1 − y 2 |




2. KNN FOR REGRESSION




If k=4, it will be average of 4 nearest data points.

How to Choose the K-factor?

Have a look at the below graphs for training error and validation error for different values of k.




First Fig shows error in training set and Second Fig shows the error in validation set.

For smaller values of k, training error is near to 0 and validation error is too high, it means model is overfitting. 

For higher values of k, both training and validation error is high. 
If you observe closely, the validation error curve reaches a minimum at a value of k = 9. This value of k is the optimum value of the model
Researchers typically use the elbow curve, named for its resemblance to an elbow, to determine the k value.

KNN works very bad for 2 situations :

1) Outliers

Consider black data points are working as an outliers. It will choose nearest points from outliers as well.


2) Imbalanced Data

K-Nearest Neighbors (KNN) algorithm can struggle with imbalanced datasets because it relies on the majority class to determine the class of a new data point.
In imbalanced datasets, where the minority class has fewer instances, their representation in the feature space might not be accurately reflected in the distance calculations, leading to misclassification.


Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Positional Encoding in Transformer

Chain Component in LangChain