
Mean shift clustering is a sliding-window-based algorithm that attempts to find dense areas of data points. This method is less sensitive to outliers (because of using the Median) but is much slower for larger datasets as sorting is required on each iteration when computing the Median vector. K-Medians is another clustering algorithm related to K-Means, except instead of recomputing the group center points using the mean we use the median vector of the group. Other cluster methods are more consistent. Thus, the results may not be repeatable and lack consistency.


K-means also starts with a random choice of cluster centers and therefore it may yield different clustering results on different runs of the algorithm. This isn’t always trivial and ideally with a clustering algorithm we’d want it to figure those out for us because the point of it is to gain some insight from the data. Firstly, you have to select how many groups/classes there are. On the other hand, K-Means has a couple of disadvantages. K-Means has the advantage that it’s pretty fast, as all we’re really doing is computing the distances between points and group centers very few computations! It thus has a linear complexity O( n). You can also opt to randomly initialize the group centers a few times, and then select the run that looks like it provided the best results.


Today, we’re going to look at 5 popular clustering algorithms that data scientists need to know and their pros and cons! K-Means Clustering In Data Science, we can use clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. 😎Ĭlustering is a Machine Learning technique that involves the grouping of data points. Want to be inspired? Come join my Super Quotes newsletter. The 5 Clustering Algorithms Data Scientists Need to Know
