Spaces:
Sleeping
Sleeping
| # Clustering Models | |
| This directory contains Python scripts defining various **clustering** models and their associated hyperparameter grids. Each model file sets up a scikit-learn–compatible clustering estimator (e.g., `KMeans`, `DBSCAN`, `GaussianMixture`) and defines a param grid for the `train_clustering_model.py` script. | |
| **Key Points**: | |
| - **Estimator**: Usually supports `.fit(X)` for unsupervised training, and either `.labels_` or `.predict(X)` to retrieve cluster assignments. | |
| - **Parameter Grid (`param_grid`)**: Used for silhouette-based hyperparameter tuning in `train_clustering_model.py`. | |
| - **Default Scoring**: Often `'silhouette'`, but can be changed if you adapt your tuning logic. | |
| **Note**: Preprocessing (dropping columns, label encoding) and any hyperparameter loop is handled externally by the script/utility. These model definition files simply define: | |
| - An **estimator** (like `KMeans(n_clusters=3, random_state=42)`). | |
| - A **`param_grid`** for silhouette tuning (e.g., `{'model__n_clusters':[2,3,4]}`). | |
| - Optionally, a **`default_scoring`** set to `'silhouette'`. | |
| ## Available Clustering Models | |
| - [KMeans](kmeans.py) | |
| - [DBSCAN](dbscan.py) | |
| - [Gaussian Mixture](gaussian_mixture.py) | |
| - [Agglomerative Clustering (Hierarchical)](hierarchical_clustering.py) ) | |
| ### Usage | |
| To train or tune any clustering model, specify the `--model_module` argument with the appropriate model name (e.g., `kmeans`) when running `train_clustering_model.py`, for example: | |
| ```bash | |
| python scripts/train_clustering_model.py \ | |
| --model_module kmeans \ | |
| --data_path data/mall_customer/Mall_Customers.csv \ | |
| --tune \ | |
| --visualize | |
| ``` | |
| This will: | |
| 1. Load the chosen model definition (`kmeans.py`). | |
| 2. Perform optional silhouette-based hyperparameter tuning if `--tune` is used. | |
| 3. Fit the final model, save it, and optionally generate a 2D scatter plot if requested. | |