ArkMaster123's picture
Update README with honest benchmark results and V1 comparison
661d4eb verified
---
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- grant-matching
- nonprofit
- foundation-grants
base_model: Qwen/Qwen3-Embedding-0.6B
datasets:
- ArkMaster123/grantpilot-training-data
language:
- en
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# GrantPilot Embedding V2 (Federal + Foundation)
Fine-tuned [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) for grant-organization semantic matching. V2 extends coverage from federal-only (NIH/NSF) to include **37,684 private foundations**.
> **See also:** [V1 (federal-only)](https://huggingface.co/ArkMaster123/grantpilot-embedding) which outperforms OpenAI on federal grant retrieval.
## Embedding Benchmark Results
Benchmarked on 998 test pairs (901 foundation, 78 NIH, 19 NSF) using retrieval and classification metrics.
### Retrieval Quality
| Model | Dim | R@1 | R@5 | R@10 | MRR | NDCG@10 |
|-------|-----|-----|-----|------|-----|---------|
| OpenAI text-embedding-3-small | 1536 | **0.343** | **0.570** | **0.682** | **0.453** | **0.499** |
| Qwen3-Embedding-0.6B (base) | 1024 | 0.295 | 0.514 | 0.630 | 0.403 | 0.449 |
| **GrantPilot V2 (this model)** | 1024 | 0.295 | 0.516 | 0.622 | 0.403 | 0.446 |
**Verdict: OpenAI wins on retrieval.** The fine-tuned V2 embedding performs on par with the base Qwen3 model β€” fine-tuning did not meaningfully improve retrieval on this mixed dataset. V1 (federal-only) significantly outperformed OpenAI on federal retrieval, but adding 90% foundation data diluted that specialization.
### AUC as Classifier Feature
| Model | Overall AUC | Foundation AUC | NIH AUC | NSF AUC |
|-------|-------------|----------------|---------|---------|
| OpenAI text-embedding-3-small | **0.886** | **0.972** | 0.473 | 0.524 |
| Qwen3-Embedding-0.6B (base) | 0.881 | 0.965 | 0.611 | 0.548 |
| **GrantPilot V2 (this model)** | 0.881 | 0.965 | **0.614** | 0.548 |
Interesting: OpenAI has the best overall AUC but **worst federal AUC** (0.47 on NIH β€” worse than random). Our fine-tuned model is best on federal grants.
### Inference Latency
| Model | Avg Latency | Cost |
|-------|-------------|------|
| OpenAI text-embedding-3-small | 43.9ms | API cost |
| Qwen3-Embedding-0.6B (base) | 2.9ms | Free (self-hosted) |
| **GrantPilot V2 (this model)** | **1.7ms** | Free (self-hosted) |
**25x faster than OpenAI** with zero API cost.
### Comparison with V1
| Metric | V1 vs OpenAI | V2 vs OpenAI |
|--------|-------------|-------------|
| R@1 | **V1 wins (+46%)** | OpenAI wins |
| R@5 | **V1 wins (+22%)** | OpenAI wins |
| R@10 | **V1 wins (+28%)** | OpenAI wins |
V1 beat OpenAI decisively on federal grants. V2 lost that edge by training on a dataset that is 90% foundation data.
## Why Use This Model?
The embedding alone is not the star β€” the **XGBoost classifier built on top** is where the real value comes from:
| Classifier Metric | V1 | V2 |
|-------------------|----|----|
| Overall AUC | 0.837 | **0.997** |
| Federal AUC | 0.837 | **0.913** |
| Accuracy | 72.1% | **98.3%** |
| F1 | 0.595 | **0.983** |
See: [grantpilot-classifier-v2](https://huggingface.co/ArkMaster123/grantpilot-classifier-v2)
## Training Details
- **Hardware**: NVIDIA H100 80GB
- **Training Steps**: 1,000 (LoRA fine-tuning)
- **Training Pairs**: 324,479 positive pairs
- **LoRA Config**: r=16, alpha=32, target=q/k/v/o projections
- **Batch Size**: 32 (x4 gradient accumulation = 128 effective)
- **Learning Rate**: 2e-5
- **Final Val Loss**: 0.1458
### Training Data Composition
| Source | Pairs | % |
|--------|-------|---|
| Foundation (990-PF) | 292,401 | 90.1% |
| NIH | 25,717 | 7.9% |
| NSF | 6,361 | 2.0% |
## Usage
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("ArkMaster123/grantpilot-embedding-v2", trust_remote_code=True)
org_text = "Organization: Ford Foundation\nLocation: New York, NY\nType: FOUNDATION"
grant_text = "Grant: Support for civil society organizations\nAmount: $500,000"
embeddings = model.encode([org_text, grant_text])
similarity = embeddings[0] @ embeddings[1]
```
## Related Models
| Model | Description |
|-------|-------------|
| [grantpilot-embedding](https://huggingface.co/ArkMaster123/grantpilot-embedding) | V1 β€” federal-only, beats OpenAI on retrieval |
| [grantpilot-classifier](https://huggingface.co/ArkMaster123/grantpilot-classifier) | V1 β€” federal-only classifier (AUC 0.837) |
| [grantpilot-classifier-v2](https://huggingface.co/ArkMaster123/grantpilot-classifier-v2) | V2 β€” combined classifier (AUC 0.997) |
| [grantpilot-training-data](https://huggingface.co/datasets/ArkMaster123/grantpilot-training-data) | Training data (V1 at training/, V2 at training_v2/) |