Machine Learning

Posts

Showing posts from July, 2024

DBSCAN Clustering

- July 29, 2024

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) Why we need DBSCAN? While K-Means clustering is a popular choice, it struggles with noisey data as it considers outliers as a cluster. Enters DBSCAN, an algorithm that not only detects the outliers but also removes them. So now, let's understand, How this algorithm works? Key Ingredients of DBSCAN Epsilon (ε) : Think of this as the maximum distance between two points for them to be neighbors. MinPts : The minimum number of points required to form a dense cluster. Core Point : A point with at least MinPts neighbors within its ε-radius. Border Point : Close to a core point but with fewer than MinPts neighbors. Noise Point : Points that don’t fit into any cluster – the outliers. Why Choose DBSCAN? Outlier Detection : Naturally identifies noise points, making it great for spotting anomalies. No Predefined Clusters : Unlike K-Means, you don’t need to specify the number of clusters beforehand. Flexibility : Handles cl...

KMeans Clustering

- July 08, 2024

What is KMeans Clustering? KMeans clustering is an unsupervised learning algorithm used to partition a dataset into K distinct, non-overlapping subsets or clusters. The goal is to group similar data points together while ensuring that data points in different clusters are as distinct as possible. How Does KMeans Clustering Work? Step 1: Initialize Centroids Choose K initial centroids randomly from the data points. These centroids are the initial cluster centers. Step 2: Assign Points to Clusters Assign each data point to the nearest centroid, forming K clusters. Step 3: Update Centroids Calculate the new centroids as the mean of all data points assigned to each cluster. Step 4: Repeat Repeat steps 2 and 3 until the centroids no longer change or change minimally. Choosing the Right Number of Clusters in KMeans Clustering Choosing the right number of clusters (K) is crucial for the effectiveness of KMeans clustering. Two popular methods to determine the optimal number of clusters ...