Spaces:

mboukabous
/

train_unsupervised

Sleeping

App Files Files Community

train_unsupervised / models /unsupervised /clustering /README.md

mboukabous

first commit

4c91838 about 1 year ago

preview code

raw

history blame contribute delete

1.88 kB

	# Clustering Models

	This directory contains Python scripts defining various clustering models and their associated hyperparameter grids. Each model file sets up a scikit-learn–compatible clustering estimator (e.g., `KMeans`, `DBSCAN`, `GaussianMixture`) and defines a param grid for the `train_clustering_model.py` script.

	Key Points:
	- Estimator: Usually supports `.fit(X)` for unsupervised training, and either `.labels_` or `.predict(X)` to retrieve cluster assignments.
	- Parameter Grid (`param_grid`): Used for silhouette-based hyperparameter tuning in `train_clustering_model.py`.
	- Default Scoring: Often `'silhouette'`, but can be changed if you adapt your tuning logic.

	Note: Preprocessing (dropping columns, label encoding) and any hyperparameter loop is handled externally by the script/utility. These model definition files simply define:
	- An estimator (like `KMeans(n_clusters=3, random_state=42)`).
	- A `param_grid` for silhouette tuning (e.g., `{'model__n_clusters':[2,3,4]}`).
	- Optionally, a `default_scoring` set to `'silhouette'`.

	## Available Clustering Models

	- [KMeans](kmeans.py)
	- [DBSCAN](dbscan.py)
	- [Gaussian Mixture](gaussian_mixture.py)
	- [Agglomerative Clustering (Hierarchical)](hierarchical_clustering.py) )

	### Usage

	To train or tune any clustering model, specify the `--model_module` argument with the appropriate model name (e.g., `kmeans`) when running `train_clustering_model.py`, for example:

	```bash
	python scripts/train_clustering_model.py \
	--model_module kmeans \
	--data_path data/mall_customer/Mall_Customers.csv \
	--tune \
	--visualize
	```

	This will:
	1. Load the chosen model definition (`kmeans.py`).
	2. Perform optional silhouette-based hyperparameter tuning if `--tune` is used.
	3. Fit the final model, save it, and optionally generate a 2D scatter plot if requested.