Wipoba's picture
Add clustering models, metadata, and README
aa29186
# User Clustering Model
This repository contains models and artifacts for a user clustering pipeline.
## Models
- Preprocessor (OneHotEncoder + StandardScaler)
- UMAP reducer for dimensionality reduction
- KMeans clustering model with k=15
## Metrics
- Best silhouette score on training: 0.4733
- Recommended silhouette score threshold for triggering auto retrain: 0.4
## Files
- `preprocessor.joblib` : preprocessing pipeline
- `umap_reducer.joblib` : UMAP reducer
- `kmeans_model.joblib` : KMeans model
- `top_categories.json` : top categories for cardinality limiting
- `cluster_sizes.png` : cluster distribution plot
- `metadata.json` : metadata JSON with metrics and parameters
## Usage
Load the models using `joblib.load()`, preprocess incoming data with the preprocessor, transform with UMAP, then predict clusters using KMeans.
Auto retrain can be triggered if silhouette score on new data falls below 0.4.
## License
Specify your license here.
---
*Generated and pushed by your clustering pipeline.*