# User Clustering Model

This repository contains models and artifacts for a user clustering pipeline.

## Models
- Preprocessor (OneHotEncoder + StandardScaler)
- UMAP reducer for dimensionality reduction
- KMeans clustering model with k=15

## Metrics
- Best silhouette score on training: 0.4733
- Recommended silhouette score threshold for triggering auto retrain: 0.4

## Files
- `preprocessor.joblib` : preprocessing pipeline
- `umap_reducer.joblib` : UMAP reducer
- `kmeans_model.joblib` : KMeans model
- `top_categories.json` : top categories for cardinality limiting
- `cluster_sizes.png` : cluster distribution plot
- `metadata.json` : metadata JSON with metrics and parameters

## Usage
Load the models using `joblib.load()`, preprocess incoming data with the preprocessor, transform with UMAP, then predict clusters using KMeans.

Auto retrain can be triggered if silhouette score on new data falls below 0.4.

## License
Specify your license here.

---

*Generated and pushed by your clustering pipeline.*