mboukabous's picture
first commit
4c91838

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

Clustering Models

This directory contains Python scripts defining various clustering models and their associated hyperparameter grids. Each model file sets up a scikit-learn–compatible clustering estimator (e.g., KMeans, DBSCAN, GaussianMixture) and defines a param grid for the train_clustering_model.py script.

Key Points:

  • Estimator: Usually supports .fit(X) for unsupervised training, and either .labels_ or .predict(X) to retrieve cluster assignments.
  • Parameter Grid (param_grid): Used for silhouette-based hyperparameter tuning in train_clustering_model.py.
  • Default Scoring: Often 'silhouette', but can be changed if you adapt your tuning logic.

Note: Preprocessing (dropping columns, label encoding) and any hyperparameter loop is handled externally by the script/utility. These model definition files simply define:

  • An estimator (like KMeans(n_clusters=3, random_state=42)).
  • A param_grid for silhouette tuning (e.g., {'model__n_clusters':[2,3,4]}).
  • Optionally, a default_scoring set to 'silhouette'.

Available Clustering Models

Usage

To train or tune any clustering model, specify the --model_module argument with the appropriate model name (e.g., kmeans) when running train_clustering_model.py, for example:

python scripts/train_clustering_model.py \
  --model_module kmeans \
  --data_path data/mall_customer/Mall_Customers.csv \
  --tune \
  --visualize

This will:

  1. Load the chosen model definition (kmeans.py).
  2. Perform optional silhouette-based hyperparameter tuning if --tune is used.
  3. Fit the final model, save it, and optionally generate a 2D scatter plot if requested.