{% extends "layout.html" %} {% block content %} Dynamic DBSCAN Clustering Visualization

Dynamic DBSCAN Clustering Visualization

Understanding DBSCAN

DBSCAN is a density-based clustering algorithm that groups data points that are closely packed together and marks outliers as noise based on their density in the feature space. It identifies clusters as dense regions in the data space separated by areas of lower density. Unlike K-Means or hierarchical clustering which assumes clusters are compact and spherical, DBSCAN performs well in handling real-world data irregularities such as:

The figure below shows a dataset with clustering algorithms: K-Means and Hierarchical handling compact, spherical clusters with varying noise tolerance while DBSCAN manages arbitrary-shaped clusters and noise handling.

Key Parameters in DBSCAN

1. eps ($$\epsilon$$): This defines the radius of the neighborhood around a data point. If the distance between two points is less than or equal to $$\epsilon$$, they are considered neighbors. A common method to determine $\epsilon$ is by analyzing the k-distance graph. Choosing the right $\epsilon$ is important:

2. MinPts: This is the minimum number of points required within the $\epsilon$ radius to form a dense region. A general rule of thumb is to set MinPts $$$\ge D+1$$, where $$D$$ is the number of dimensions in the dataset.

How Does DBSCAN Work?

DBSCAN works by categorizing data points into three types:

Steps in the DBSCAN Algorithm

How this DBSCAN Visualization Handles User-Added Data

In this interactive visualization, when you click the "Add New Point & Cluster" button, the new point you specify is appended to the existing dataset. Importantly, the entire DBSCAN clustering algorithm is then re-run from scratch on this updated dataset. This means that:

{% endblock %}