Spaces:

alidenewade
/

model-point-clustering

Sleeping

App Files Files Community

model-point-clustering / README.md

alidenewade

Update README.md

f2296e3 verified 9 months ago

preview code

raw

history blame contribute delete

5.27 kB

	---
	title: Model Point Clustering
	emoji: 🧮
	colorFrom: yellow
	colorTo: green
	sdk: gradio
	sdk_version: 5.31.0
	app_file: app.py
	pinned: false
	license: mit
	tags:
	- actuarial
	- clustering
	- model-points
	- insurance
	- gradio
	- data-science
	- present-values
	- policy-attributes
	- cashflows
	- machine-learning
	short_description: Cluster insurance policies into representative model points.
	---

	# 🧮 Model Point Clustering Dashboard

	An interactive dashboard for calibrating and evaluating model points using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios.

	[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/alidenewade/model-point-clustering)

	---

	## 📌 Overview

	This application performs cluster-based model point selection by grouping similar policies to represent large portfolios more efficiently.

	You can choose from three clustering calibration methods:

	- Annual Cashflows
	- Policy Attributes
	- Present Values

	It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios.

	---

	## 🔍 Use Cases

	- Model point reduction for valuation and projections
	- Policy summarization for faster simulations
	- Stress testing comparison across representative points
	- Actuarial model validation and calibration studies

	---

	## 📈 Features

	### Calibration Methods
	- Cashflows: Captures policy behavior over time.
	- Attributes: Uses demographic/product characteristics.
	- Present Values: Focuses on total liability or cashflow values.

	### Interactive Tabs
	- Summary: Bar chart of absolute PV Net Cashflow errors.
	- Cashflow Calibration: Visual and tabular comparisons based on cashflows.
	- Policy Attribute Calibration: Analysis using static policy data.
	- Present Value Calibration: PV-based clustering with stress testing.

	### Scenario Support
	- Base Scenario
	- Lapse Stress (+50%)
	- Mortality Stress (+15%)

	---

	## 📁 Required Inputs

	Upload 7 `.xlsx` files, or use the example files by clicking Load Example Data.

	\| File Type \| Description \|
	\|----------\|-------------\|
	\| `cashflows_seriatim_10K.xlsx` \| Base cashflows per policy \|
	\| `cashflows_seriatim_10K_lapse50.xlsx` \| Cashflows under lapse stress \|
	\| `cashflows_seriatim_10K_mort15.xlsx` \| Cashflows under mortality stress \|
	\| `model_point_table.xlsx` \| Policy attributes (age, term, etc.) \|
	\| `pv_seriatim_10K.xlsx` \| Present values for base \|
	\| `pv_seriatim_10K_lapse50.xlsx` \| PVs under lapse stress \|
	\| `pv_seriatim_10K_mort15.xlsx` \| PVs under mortality stress \|

	Example directory structure:

	```
	├── app.py
	└── eg_data/
	├── cashflows_seriatim_10K.xlsx
	├── cashflows_seriatim_10K_lapse50.xlsx
	├── cashflows_seriatim_10K_mort15.xlsx
	├── model_point_table.xlsx
	├── pv_seriatim_10K.xlsx
	├── pv_seriatim_10K_lapse50.xlsx
	└── pv_seriatim_10K_mort15.xlsx
	```

	---

	## ⚙️ How to Use

	1. Launch the App
	Click the "Open in Spaces" button or run `app.py`.

	2. Upload or Load Files
	- Upload all 7 required `.xlsx` files.
	- Or click "Load Example Data".

	3. Run Analysis
	Click "Analyze Dataset" to generate cluster reps, plots, and comparisons.

	4. Explore Tabs
	- 📊 Summary: Calibration errors across scenarios.
	- 💸 Cashflow Calibration: Clustered vs actual based on cashflows.
	- 👤 Policy Attribute Calibration: Calibrated via policy data.
	- 💰 Present Value Calibration: Uses PVs directly.

	---

	## 🧠 Behind the Scenes

	### Core Engine: `Clusters` Class
	Encapsulates K-Means logic for:
	- Clustering using selected variables
	- Selecting representative policies
	- Aggregating actual vs estimated outputs
	- Plotting cashflows, PVs, and scatter comparisons

	### Key Libraries
	- `gradio` – UI and file interface
	- `pandas`, `numpy` – Data manipulation
	- `scikit-learn` – K-Means clustering
	- `matplotlib`, `PIL` – Visualization

	---

	## 📊 Output Summary

	The application generates:

	- 📈 Cluster vs Actual Comparisons
	- 🖼️ Cashflow Time Series Plots
	- ⚖️ Per-Cluster Scatter Plots
	- 📋 Summary Tables
	- 📉 Mean Absolute Error Bar Charts

	All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics.

	---

	## 📚 Attribution & References

	Inspired by the [Lifelib](https://lifelib.io) open-source project:

	> lifelib Developers. (2025). Model Point Clustering. In lifelib: Life actuarial models in Python.
	> [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)

	Notebook reference:
	[Cluster Model Points – Lifelib Notebook](https://colab.research.google.com/github/lifelib-dev/lifelib/blob/current/lifelib/libraries/cluster/cluster_model_points.ipynb)

	---

	## 🛠️ Local Setup

	To run locally:

	```bash
	# Clone the repo
	git clone https://github.com/alidenewade/model-point-clustering.git
	cd model-point-clustering

	# Install dependencies
	pip install -r requirements.txt

	# Launch app
	python app.py

	📜 License
	This project is open source under the MIT License.