Spaces:

mboukabous
/

train_unsupervised

Sleeping

App Files Files Community

train_unsupervised / models /unsupervised /clustering /README.md

mboukabous

first commit

4c91838 about 1 year ago

preview code

raw

history blame contribute delete

1.88 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

Clustering Models

This directory contains Python scripts defining various clustering models and their associated hyperparameter grids. Each model file sets up a scikit-learn–compatible clustering estimator (e.g., KMeans, DBSCAN, GaussianMixture) and defines a param grid for the train_clustering_model.py script.

Key Points:

Estimator: Usually supports .fit(X) for unsupervised training, and either .labels_ or .predict(X) to retrieve cluster assignments.
Parameter Grid (param_grid): Used for silhouette-based hyperparameter tuning in train_clustering_model.py.
Default Scoring: Often 'silhouette', but can be changed if you adapt your tuning logic.

Note: Preprocessing (dropping columns, label encoding) and any hyperparameter loop is handled externally by the script/utility. These model definition files simply define:

An estimator (like KMeans(n_clusters=3, random_state=42)).
A param_grid for silhouette tuning (e.g., {'model__n_clusters':[2,3,4]}).
Optionally, a default_scoring set to 'silhouette'.

Available Clustering Models

Usage

To train or tune any clustering model, specify the --model_module argument with the appropriate model name (e.g., kmeans) when running train_clustering_model.py, for example:

python scripts/train_clustering_model.py \
  --model_module kmeans \
  --data_path data/mall_customer/Mall_Customers.csv \
  --tune \
  --visualize

This will:

Load the chosen model definition (kmeans.py).
Perform optional silhouette-based hyperparameter tuning if --tune is used.
Fit the final model, save it, and optionally generate a 2D scatter plot if requested.