Algorithms
Imal Perera  

K-Means Algorithm

Spread the love

k-means is one of the most widely used Clustering algorithm existing right now, in the algorithm k is a variable which we need to provide and algorithm will divide a given n number of observations into k number of clusters, logic of assigning a cluster to a particular observation is based on the distance between the observation and the cluster.

which means if a particular observation is most near to a particular cluster that means the observation is belong to that cluster

 

Lets look at what you will get after performing k means algorithm

lets assume there is 10 observations that can be visualized in 2d space like below

kmeans

after performing K-means algorithm, it will successfully identify the cluster centers like below, in this case
k is equal to 3, this is very obvious situation but algorithm can do the same in complected situations

 

kmeans21

 

Lets learn Euclidean distance

before you go further it is important to know about Euclidean distance, Euclidean distance is nothing but the distance between two given coordinates, this is how you calculated it in your school for given two coordinates that can be represented in 2d Space

euclidean

but Euclidean distance is not only for 2d spcae it is defined for multidimensional space, which means you can use it to find the distance between two points that are in multidimensional space like below

euclidian

 

Here are the steps of performing k means algorithm

  1. Input : Set of observations x1….x2 and a value for k
  2. Place initial cluster centers randomly
  3. Calculate Euclidean distance form each initial cluster center to observations, observations (vectors) that shows less distance to a particular initial cluster center is assigned to that cluster
  4. Since cluster centers are first randomly selected we need to find the new cluster centers, to do that we take the mean of the cluster points and assign that value as the new cluster center.
  5. After doing 4th you now need to perform 3 and 4 continuesly untill observations that are assigned to clusters are unchanged. at this point you find the clusters

Note : sometimes this 3-4 process goes infinitely, in such time we stop the algorithm in defined number of loops.

Here is the process visually

step1step2

endofalgo

Leave A Comment