| # User Clustering Model | |
| This repository contains models and artifacts for a user clustering pipeline. | |
| ## Models | |
| - Preprocessor (OneHotEncoder + StandardScaler) | |
| - UMAP reducer for dimensionality reduction | |
| - KMeans clustering model with k=15 | |
| ## Metrics | |
| - Best silhouette score on training: 0.4733 | |
| - Recommended silhouette score threshold for triggering auto retrain: 0.4 | |
| ## Files | |
| - `preprocessor.joblib` : preprocessing pipeline | |
| - `umap_reducer.joblib` : UMAP reducer | |
| - `kmeans_model.joblib` : KMeans model | |
| - `top_categories.json` : top categories for cardinality limiting | |
| - `cluster_sizes.png` : cluster distribution plot | |
| - `metadata.json` : metadata JSON with metrics and parameters | |
| ## Usage | |
| Load the models using `joblib.load()`, preprocess incoming data with the preprocessor, transform with UMAP, then predict clusters using KMeans. | |
| Auto retrain can be triggered if silhouette score on new data falls below 0.4. | |
| ## License | |
| Specify your license here. | |
| --- | |
| *Generated and pushed by your clustering pipeline.* | |