Update README with V2 metrics and honest embedding assessment

00cea16 verified 25 days ago

3.87 kB

	---
	license: apache-2.0
	tags:
	- xgboost
	- classification
	- grant-matching
	- win-probability
	- nonprofit
	- foundation-grants
	datasets:
	- ArkMaster123/grantpilot-training-data
	language:
	- en
	---

	# GrantPilot Win Probability Classifier V2 (Federal + Foundation)

	XGBoost classifier for predicting grant funding success. V2 extends coverage from federal-only (NIH/NSF) to include 37,684 private foundations.

	> See also: [V1 (federal-only)](https://huggingface.co/ArkMaster123/grantpilot-classifier)

	## Performance

	\| Metric \| V1 (Federal Only) \| V2 (Combined) \| Change \|
	\|--------\|-------------------\|---------------\|--------\|
	\| Overall AUC-ROC \| 0.837 \| 0.997 \| +19.1% \|
	\| Federal AUC \| 0.837 \| 0.913 \| +9.1% \|
	\| Brier Score \| 0.167 \| 0.014 \| -91.6% \|
	\| Accuracy \| 72.1% \| 98.3% \| +26.2% \|
	\| Precision \| 47.4% \| 97.1% \| +49.7% \|
	\| Recall \| 79.9% \| 99.6% \| +19.7% \|
	\| F1 Score \| 0.595 \| 0.983 \| +65.2% \|

	### Federal Regression Check: PASS

	Federal-only AUC improved from 0.837 to 0.913, well above the 0.817 minimum threshold.

	## Important Context

	The classifier is excellent, but the embedding model feeding it is not — see [grantpilot-embedding-v2](https://huggingface.co/ArkMaster123/grantpilot-embedding-v2) benchmark results. The V2 embedding underperforms OpenAI on retrieval (unlike V1 which beat OpenAI). The classifier compensates because it uses multiple features beyond just cosine similarity.

	## Model Architecture

	```
	Input Features:
	├── cosine_similarity (from grantpilot-embedding-v2)
	├── funder_type (categorical: FOUNDATION, FEDERAL)
	├── source (categorical: NIH, NSF, FOUNDATIONS)
	├── log_amount (grant amount)
	├── org_text_length
	└── grant_text_length

	→ XGBoost Classifier
	→ Isotonic Calibration
	→ Win Probability (0-100%)
	```

	## Training Data

	\| Split \| Foundation \| NIH \| NSF \| Total \|
	\|-------\|-----------\|-----\|-----\|-------\|
	\| Train \| 584,802 \| 51,434 \| 12,638 \| 648,874 \|
	\| Val \| 73,240 \| 6,445 \| 1,599 \| 81,284 \|
	\| Test \| 73,022 \| 6,384 \| 1,588 \| 80,994 \|

	Foundation data sourced from IRS 990-PF e-filings via GivingTuesday (680,970 grants, 88% with purpose text).

	## Training Details

	- Hardware: NVIDIA H100 80GB
	- XGBoost: max_depth=6, lr=0.1, n_estimators=200, subsample=0.8
	- Calibration: Isotonic regression on validation set
	- Batch Size: 256 for embedding feature computation

	## Usage

	```python
	import xgboost as xgb
	import pickle
	from huggingface_hub import hf_hub_download

	# Download model files
	model_path = hf_hub_download("ArkMaster123/grantpilot-classifier-v2", "xgboost_model.json")
	scaler_path = hf_hub_download("ArkMaster123/grantpilot-classifier-v2", "scaler.pkl")
	calibrator_path = hf_hub_download("ArkMaster123/grantpilot-classifier-v2", "isotonic_calibrator.pkl")

	# Load
	model = xgb.Booster()
	model.load_model(model_path)

	with open(scaler_path, "rb") as f:
	scaler = pickle.load(f)
	with open(calibrator_path, "rb") as f:
	calibrator = pickle.load(f)

	# Predict
	features_scaled = scaler.transform(features)
	dmatrix = xgb.DMatrix(features_scaled)
	raw_pred = model.predict(dmatrix)
	win_probability = calibrator.predict(raw_pred) * 100
	```

	## Related Models

	\| Model \| Description \|
	\|-------\|-------------\|
	\| [grantpilot-embedding-v2](https://huggingface.co/ArkMaster123/grantpilot-embedding-v2) \| V2 embedding (required for cosine_similarity feature) \|
	\| [grantpilot-embedding](https://huggingface.co/ArkMaster123/grantpilot-embedding) \| V1 — federal-only, beats OpenAI on retrieval \|
	\| [grantpilot-classifier](https://huggingface.co/ArkMaster123/grantpilot-classifier) \| V1 — federal-only classifier (AUC 0.837) \|
	\| [grantpilot-training-data](https://huggingface.co/datasets/ArkMaster123/grantpilot-training-data) \| Training data (V1 at training/, V2 at training_v2/) \|