Spaces:
Sleeping
Sleeping
| title: Model Point Clustering | |
| emoji: ๐งฎ | |
| colorFrom: yellow | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 5.31.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| tags: | |
| - actuarial | |
| - clustering | |
| - model-points | |
| - insurance | |
| - gradio | |
| - data-science | |
| - present-values | |
| - policy-attributes | |
| - cashflows | |
| - machine-learning | |
| short_description: Cluster insurance policies into representative model points. | |
| # ๐งฎ Model Point Clustering Dashboard | |
| An interactive dashboard for calibrating and evaluating **model points** using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios. | |
| [](https://huggingface.co/spaces/alidenewade/model-point-clustering) | |
| --- | |
| ## ๐ Overview | |
| This application performs **cluster-based model point selection** by grouping similar policies to represent large portfolios more efficiently. | |
| You can choose from three clustering calibration methods: | |
| - **Annual Cashflows** | |
| - **Policy Attributes** | |
| - **Present Values** | |
| It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios. | |
| --- | |
| ## ๐ Use Cases | |
| - Model point reduction for valuation and projections | |
| - Policy summarization for faster simulations | |
| - Stress testing comparison across representative points | |
| - Actuarial model validation and calibration studies | |
| --- | |
| ## ๐ Features | |
| ### Calibration Methods | |
| - **Cashflows**: Captures policy behavior over time. | |
| - **Attributes**: Uses demographic/product characteristics. | |
| - **Present Values**: Focuses on total liability or cashflow values. | |
| ### Interactive Tabs | |
| - **Summary**: Bar chart of absolute PV Net Cashflow errors. | |
| - **Cashflow Calibration**: Visual and tabular comparisons based on cashflows. | |
| - **Policy Attribute Calibration**: Analysis using static policy data. | |
| - **Present Value Calibration**: PV-based clustering with stress testing. | |
| ### Scenario Support | |
| - Base Scenario | |
| - Lapse Stress (+50%) | |
| - Mortality Stress (+15%) | |
| --- | |
| ## ๐ Required Inputs | |
| Upload **7 `.xlsx` files**, or use the example files by clicking **Load Example Data**. | |
| | File Type | Description | | |
| |----------|-------------| | |
| | `cashflows_seriatim_10K.xlsx` | Base cashflows per policy | | |
| | `cashflows_seriatim_10K_lapse50.xlsx` | Cashflows under lapse stress | | |
| | `cashflows_seriatim_10K_mort15.xlsx` | Cashflows under mortality stress | | |
| | `model_point_table.xlsx` | Policy attributes (age, term, etc.) | | |
| | `pv_seriatim_10K.xlsx` | Present values for base | | |
| | `pv_seriatim_10K_lapse50.xlsx` | PVs under lapse stress | | |
| | `pv_seriatim_10K_mort15.xlsx` | PVs under mortality stress | | |
| Example directory structure: | |
| ``` | |
| โโโ app.py | |
| โโโ eg_data/ | |
| โโโ cashflows_seriatim_10K.xlsx | |
| โโโ cashflows_seriatim_10K_lapse50.xlsx | |
| โโโ cashflows_seriatim_10K_mort15.xlsx | |
| โโโ model_point_table.xlsx | |
| โโโ pv_seriatim_10K.xlsx | |
| โโโ pv_seriatim_10K_lapse50.xlsx | |
| โโโ pv_seriatim_10K_mort15.xlsx | |
| ``` | |
| --- | |
| ## โ๏ธ How to Use | |
| 1. **Launch the App** | |
| Click the "Open in Spaces" button or run `app.py`. | |
| 2. **Upload or Load Files** | |
| - Upload all 7 required `.xlsx` files. | |
| - Or click **"Load Example Data"**. | |
| 3. **Run Analysis** | |
| Click **"Analyze Dataset"** to generate cluster reps, plots, and comparisons. | |
| 4. **Explore Tabs** | |
| - ๐ **Summary**: Calibration errors across scenarios. | |
| - ๐ธ **Cashflow Calibration**: Clustered vs actual based on cashflows. | |
| - ๐ค **Policy Attribute Calibration**: Calibrated via policy data. | |
| - ๐ฐ **Present Value Calibration**: Uses PVs directly. | |
| --- | |
| ## ๐ง Behind the Scenes | |
| ### Core Engine: `Clusters` Class | |
| Encapsulates K-Means logic for: | |
| - Clustering using selected variables | |
| - Selecting representative policies | |
| - Aggregating actual vs estimated outputs | |
| - Plotting cashflows, PVs, and scatter comparisons | |
| ### Key Libraries | |
| - `gradio` โ UI and file interface | |
| - `pandas`, `numpy` โ Data manipulation | |
| - `scikit-learn` โ K-Means clustering | |
| - `matplotlib`, `PIL` โ Visualization | |
| --- | |
| ## ๐ Output Summary | |
| The application generates: | |
| - ๐ **Cluster vs Actual Comparisons** | |
| - ๐ผ๏ธ **Cashflow Time Series Plots** | |
| - โ๏ธ **Per-Cluster Scatter Plots** | |
| - ๐ **Summary Tables** | |
| - ๐ **Mean Absolute Error Bar Charts** | |
| All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics. | |
| --- | |
| ## ๐ Attribution & References | |
| Inspired by the [Lifelib](https://lifelib.io) open-source project: | |
| > lifelib Developers. (2025). *Model Point Clustering*. In **lifelib: Life actuarial models in Python**. | |
| > [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib) | |
| Notebook reference: | |
| [Cluster Model Points โ Lifelib Notebook](https://colab.research.google.com/github/lifelib-dev/lifelib/blob/current/lifelib/libraries/cluster/cluster_model_points.ipynb) | |
| --- | |
| ## ๐ ๏ธ Local Setup | |
| To run locally: | |
| ```bash | |
| # Clone the repo | |
| git clone https://github.com/alidenewade/model-point-clustering.git | |
| cd model-point-clustering | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Launch app | |
| python app.py | |
| ๐ License | |
| This project is open source under the MIT License. |