mboukabous's picture
first commit
4c91838
"""
umap.py
This module defines a Uniform Manifold Approximation and Projection (UMAP) model
for dimensionality reduction. UMAP is a nonlinear dimensionality reduction technique
that is efficient for visualizing and analyzing high-dimensional data.
Key Features:
- Preserves both local and global data structures better than t-SNE in some cases.
- Scales efficiently to larger datasets compared to t-SNE.
- Suitable for exploratory data analysis and clustering.
Parameters:
- n_components (int): Number of dimensions for projection (default: 2 for visualization).
- n_neighbors (int): Determines the size of the local neighborhood to consider for manifold approximation.
- Typical values range between 5 and 50.
- min_dist (float): Minimum distance between points in the low-dimensional space.
- Smaller values maintain tighter clusters.
- metric (str): Distance metric for computing similarity (default: 'euclidean').
Default:
- n_components=2: Projects the data into a 2D space for visualization purposes.
- n_neighbors=15: Balances local and global structure preservation.
- min_dist=0.1: Provides moderate clustering while preserving distances.
Requirements:
- umap-learn library must be installed.
"""
# Import UMAP from the umap-learn library
import umap.umap_ as umap
# Define the UMAP estimator
estimator = umap.UMAP(n_components=2, n_neighbors=15, min_dist=0.1) # Default configuration for 2D projection