{% extends "layout.html" %} {% block content %}
Explore K-Means, an unsupervised learning algorithm that partitions data into K distinct clusters. For example, an online store uses K-Means to group customers based on purchase frequency and spending, creating segments like Budget Shoppers, Frequent Buyers, and Big Spenders for personalized marketing.
We are given a data set of items with certain features and values for these features like a vector. The task is to categorize those items into groups. To achieve this we will use the K-means algorithm. 'K' in the name of the algorithm represents the number of groups/clusters we want to classify our items into.
The algorithm works by first randomly picking some central points called centroids and each data point is then assigned to the closest centroid forming a cluster. After all the points are assigned to a cluster, the centroids are updated by finding the average position of the points in each cluster. This process repeats until the centroids stop changing, forming clusters. The goal of clustering is to divide the data points into clusters so that similar data points belong to the same group.
1. Initialize Centroids
Randomly pick K points
2. Assign Points to Clusters
To closest centroid
3. Update Centroids
Calculate new means
4. Convergence Check
Centroids stabilized?
The K-Means algorithm iteratively refines clusters. It starts by randomly selecting initial centroids. Then, each data point is assigned to the closest centroid, forming preliminary clusters. Next, the centroids are updated to the mean position of all points within their assigned clusters. This assignment and update process repeats until the centroids no longer change significantly, indicating that the clusters have converged.
K-Means is an unsupervised learning algorithm that aims to partition $$n$$ observations into $$k$$ clusters in which each observation belongs to the cluster with the nearest mean (centroid), serving as a prototype of the cluster.
The objective of K-Means is to minimize the sum of squared distances between data points and their assigned cluster's centroid, also known as the within-cluster sum of squares (WCSS).
Where: