# User Clustering Model This repository contains models and artifacts for a user clustering pipeline. ## Models - Preprocessor (OneHotEncoder + StandardScaler) - UMAP reducer for dimensionality reduction - KMeans clustering model with k=15 ## Metrics - Best silhouette score on training: 0.4733 - Recommended silhouette score threshold for triggering auto retrain: 0.4 ## Files - `preprocessor.joblib` : preprocessing pipeline - `umap_reducer.joblib` : UMAP reducer - `kmeans_model.joblib` : KMeans model - `top_categories.json` : top categories for cardinality limiting - `cluster_sizes.png` : cluster distribution plot - `metadata.json` : metadata JSON with metrics and parameters ## Usage Load the models using `joblib.load()`, preprocess incoming data with the preprocessor, transform with UMAP, then predict clusters using KMeans. Auto retrain can be triggered if silhouette score on new data falls below 0.4. ## License Specify your license here. --- *Generated and pushed by your clustering pipeline.*