alidenewade's picture
Update README.md
f2296e3 verified
---
title: Model Point Clustering
emoji: ๐Ÿงฎ
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: mit
tags:
- actuarial
- clustering
- model-points
- insurance
- gradio
- data-science
- present-values
- policy-attributes
- cashflows
- machine-learning
short_description: Cluster insurance policies into representative model points.
---
# ๐Ÿงฎ Model Point Clustering Dashboard
An interactive dashboard for calibrating and evaluating **model points** using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios.
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/alidenewade/model-point-clustering)
---
## ๐Ÿ“Œ Overview
This application performs **cluster-based model point selection** by grouping similar policies to represent large portfolios more efficiently.
You can choose from three clustering calibration methods:
- **Annual Cashflows**
- **Policy Attributes**
- **Present Values**
It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios.
---
## ๐Ÿ” Use Cases
- Model point reduction for valuation and projections
- Policy summarization for faster simulations
- Stress testing comparison across representative points
- Actuarial model validation and calibration studies
---
## ๐Ÿ“ˆ Features
### Calibration Methods
- **Cashflows**: Captures policy behavior over time.
- **Attributes**: Uses demographic/product characteristics.
- **Present Values**: Focuses on total liability or cashflow values.
### Interactive Tabs
- **Summary**: Bar chart of absolute PV Net Cashflow errors.
- **Cashflow Calibration**: Visual and tabular comparisons based on cashflows.
- **Policy Attribute Calibration**: Analysis using static policy data.
- **Present Value Calibration**: PV-based clustering with stress testing.
### Scenario Support
- Base Scenario
- Lapse Stress (+50%)
- Mortality Stress (+15%)
---
## ๐Ÿ“ Required Inputs
Upload **7 `.xlsx` files**, or use the example files by clicking **Load Example Data**.
| File Type | Description |
|----------|-------------|
| `cashflows_seriatim_10K.xlsx` | Base cashflows per policy |
| `cashflows_seriatim_10K_lapse50.xlsx` | Cashflows under lapse stress |
| `cashflows_seriatim_10K_mort15.xlsx` | Cashflows under mortality stress |
| `model_point_table.xlsx` | Policy attributes (age, term, etc.) |
| `pv_seriatim_10K.xlsx` | Present values for base |
| `pv_seriatim_10K_lapse50.xlsx` | PVs under lapse stress |
| `pv_seriatim_10K_mort15.xlsx` | PVs under mortality stress |
Example directory structure:
```
โ”œโ”€โ”€ app.py
โ””โ”€โ”€ eg_data/
โ”œโ”€โ”€ cashflows_seriatim_10K.xlsx
โ”œโ”€โ”€ cashflows_seriatim_10K_lapse50.xlsx
โ”œโ”€โ”€ cashflows_seriatim_10K_mort15.xlsx
โ”œโ”€โ”€ model_point_table.xlsx
โ”œโ”€โ”€ pv_seriatim_10K.xlsx
โ”œโ”€โ”€ pv_seriatim_10K_lapse50.xlsx
โ””โ”€โ”€ pv_seriatim_10K_mort15.xlsx
```
---
## โš™๏ธ How to Use
1. **Launch the App**
Click the "Open in Spaces" button or run `app.py`.
2. **Upload or Load Files**
- Upload all 7 required `.xlsx` files.
- Or click **"Load Example Data"**.
3. **Run Analysis**
Click **"Analyze Dataset"** to generate cluster reps, plots, and comparisons.
4. **Explore Tabs**
- ๐Ÿ“Š **Summary**: Calibration errors across scenarios.
- ๐Ÿ’ธ **Cashflow Calibration**: Clustered vs actual based on cashflows.
- ๐Ÿ‘ค **Policy Attribute Calibration**: Calibrated via policy data.
- ๐Ÿ’ฐ **Present Value Calibration**: Uses PVs directly.
---
## ๐Ÿง  Behind the Scenes
### Core Engine: `Clusters` Class
Encapsulates K-Means logic for:
- Clustering using selected variables
- Selecting representative policies
- Aggregating actual vs estimated outputs
- Plotting cashflows, PVs, and scatter comparisons
### Key Libraries
- `gradio` โ€“ UI and file interface
- `pandas`, `numpy` โ€“ Data manipulation
- `scikit-learn` โ€“ K-Means clustering
- `matplotlib`, `PIL` โ€“ Visualization
---
## ๐Ÿ“Š Output Summary
The application generates:
- ๐Ÿ“ˆ **Cluster vs Actual Comparisons**
- ๐Ÿ–ผ๏ธ **Cashflow Time Series Plots**
- โš–๏ธ **Per-Cluster Scatter Plots**
- ๐Ÿ“‹ **Summary Tables**
- ๐Ÿ“‰ **Mean Absolute Error Bar Charts**
All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics.
---
## ๐Ÿ“š Attribution & References
Inspired by the [Lifelib](https://lifelib.io) open-source project:
> lifelib Developers. (2025). *Model Point Clustering*. In **lifelib: Life actuarial models in Python**.
> [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)
Notebook reference:
[Cluster Model Points โ€“ Lifelib Notebook](https://colab.research.google.com/github/lifelib-dev/lifelib/blob/current/lifelib/libraries/cluster/cluster_model_points.ipynb)
---
## ๐Ÿ› ๏ธ Local Setup
To run locally:
```bash
# Clone the repo
git clone https://github.com/alidenewade/model-point-clustering.git
cd model-point-clustering
# Install dependencies
pip install -r requirements.txt
# Launch app
python app.py
๐Ÿ“œ License
This project is open source under the MIT License.