Wipoba's picture
Add clustering models, metadata, and README
aa29186

User Clustering Model

This repository contains models and artifacts for a user clustering pipeline.

Models

  • Preprocessor (OneHotEncoder + StandardScaler)
  • UMAP reducer for dimensionality reduction
  • KMeans clustering model with k=15

Metrics

  • Best silhouette score on training: 0.4733
  • Recommended silhouette score threshold for triggering auto retrain: 0.4

Files

  • preprocessor.joblib : preprocessing pipeline
  • umap_reducer.joblib : UMAP reducer
  • kmeans_model.joblib : KMeans model
  • top_categories.json : top categories for cardinality limiting
  • cluster_sizes.png : cluster distribution plot
  • metadata.json : metadata JSON with metrics and parameters

Usage

Load the models using joblib.load(), preprocess incoming data with the preprocessor, transform with UMAP, then predict clusters using KMeans.

Auto retrain can be triggered if silhouette score on new data falls below 0.4.

License

Specify your license here.


Generated and pushed by your clustering pipeline.