{% extends "layout.html" %} {% block content %}

📊 Interactive Dynamic K-Means Clustering Visualization

Explore K-Means, an unsupervised learning algorithm that partitions data into K distinct clusters. For example, an online store uses K-Means to group customers based on purchase frequency and spending, creating segments like Budget Shoppers, Frequent Buyers, and Big Spenders for personalized marketing.

How K-Means Clustering Works?

We are given a data set of items with certain features and values for these features like a vector. The task is to categorize those items into groups. To achieve this we will use the K-means algorithm. 'K' in the name of the algorithm represents the number of groups/clusters we want to classify our items into.

The algorithm works by first randomly picking some central points called centroids and each data point is then assigned to the closest centroid forming a cluster. After all the points are assigned to a cluster, the centroids are updated by finding the average position of the points in each cluster. This process repeats until the centroids stop changing, forming clusters. The goal of clustering is to divide the data points into clusters so that similar data points belong to the same group.

Flow of Data Treated by K-Means Algorithm

1. Initialize Centroids

Randomly pick K points

📍

2. Assign Points to Clusters

To closest centroid

🔄

3. Update Centroids

Calculate new means

4. Convergence Check

Centroids stabilized?

The K-Means algorithm iteratively refines clusters. It starts by randomly selecting initial centroids. Then, each data point is assigned to the closest centroid, forming preliminary clusters. Next, the centroids are updated to the mean position of all points within their assigned clusters. This assignment and update process repeats until the centroids no longer change significantly, indicating that the clusters have converged.

Understanding K-Means Clustering

K-Means is an unsupervised learning algorithm that aims to partition $$n$$ observations into $$k$$ clusters in which each observation belongs to the cluster with the nearest mean (centroid), serving as a prototype of the cluster.

How K-Means Algorithm Works:

  1. 1. Initialization: Randomly select $$K$$ data points from the dataset as initial centroids.
  2. 2. Assignment Step: Assign each data point to the cluster whose centroid is closest to it (based on Euclidean distance).
  3. 3. Update Step: Recalculate the centroids by taking the mean of all data points assigned to that cluster.
  4. 4. Iteration: Repeat steps 2 and 3 until the centroids no longer move significantly or a maximum number of iterations is reached.

The objective of K-Means is to minimize the sum of squared distances between data points and their assigned cluster's centroid, also known as the within-cluster sum of squares (WCSS).

$$ \min_{C} \sum_{i=1}^{k} \sum_{x \in C_i} \|x - \mu_i\|^2 $$

Where:

  • $$C$$ is the set of clusters.
  • $$C_i$$ is the $$i$$-th cluster.
  • $$x$$ is a data point.
  • $$\mu_i$$ is the centroid of cluster $$C_i$$.

Key Concepts of K-Means:

  • Centroid: The mean position of all data points in a cluster.
  • K: The number of clusters to form. This is a hyperparameter that must be chosen before running the algorithm.
  • Voronoi Diagram: The partitioning of the plane into regions based on distance to points in a specific subset of the plane. In K-Means, these regions represent the areas where new points would be assigned to a particular cluster.
{% endblock %}