ArkMaster123's picture
Update README with honest benchmark results and V1 comparison
661d4eb verified
metadata
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - grant-matching
  - nonprofit
  - foundation-grants
base_model: Qwen/Qwen3-Embedding-0.6B
datasets:
  - ArkMaster123/grantpilot-training-data
language:
  - en
pipeline_tag: sentence-similarity
library_name: sentence-transformers

GrantPilot Embedding V2 (Federal + Foundation)

Fine-tuned Qwen3-Embedding-0.6B for grant-organization semantic matching. V2 extends coverage from federal-only (NIH/NSF) to include 37,684 private foundations.

See also: V1 (federal-only) which outperforms OpenAI on federal grant retrieval.

Embedding Benchmark Results

Benchmarked on 998 test pairs (901 foundation, 78 NIH, 19 NSF) using retrieval and classification metrics.

Retrieval Quality

Model Dim R@1 R@5 R@10 MRR NDCG@10
OpenAI text-embedding-3-small 1536 0.343 0.570 0.682 0.453 0.499
Qwen3-Embedding-0.6B (base) 1024 0.295 0.514 0.630 0.403 0.449
GrantPilot V2 (this model) 1024 0.295 0.516 0.622 0.403 0.446

Verdict: OpenAI wins on retrieval. The fine-tuned V2 embedding performs on par with the base Qwen3 model — fine-tuning did not meaningfully improve retrieval on this mixed dataset. V1 (federal-only) significantly outperformed OpenAI on federal retrieval, but adding 90% foundation data diluted that specialization.

AUC as Classifier Feature

Model Overall AUC Foundation AUC NIH AUC NSF AUC
OpenAI text-embedding-3-small 0.886 0.972 0.473 0.524
Qwen3-Embedding-0.6B (base) 0.881 0.965 0.611 0.548
GrantPilot V2 (this model) 0.881 0.965 0.614 0.548

Interesting: OpenAI has the best overall AUC but worst federal AUC (0.47 on NIH — worse than random). Our fine-tuned model is best on federal grants.

Inference Latency

Model Avg Latency Cost
OpenAI text-embedding-3-small 43.9ms API cost
Qwen3-Embedding-0.6B (base) 2.9ms Free (self-hosted)
GrantPilot V2 (this model) 1.7ms Free (self-hosted)

25x faster than OpenAI with zero API cost.

Comparison with V1

Metric V1 vs OpenAI V2 vs OpenAI
R@1 V1 wins (+46%) OpenAI wins
R@5 V1 wins (+22%) OpenAI wins
R@10 V1 wins (+28%) OpenAI wins

V1 beat OpenAI decisively on federal grants. V2 lost that edge by training on a dataset that is 90% foundation data.

Why Use This Model?

The embedding alone is not the star — the XGBoost classifier built on top is where the real value comes from:

Classifier Metric V1 V2
Overall AUC 0.837 0.997
Federal AUC 0.837 0.913
Accuracy 72.1% 98.3%
F1 0.595 0.983

See: grantpilot-classifier-v2

Training Details

  • Hardware: NVIDIA H100 80GB
  • Training Steps: 1,000 (LoRA fine-tuning)
  • Training Pairs: 324,479 positive pairs
  • LoRA Config: r=16, alpha=32, target=q/k/v/o projections
  • Batch Size: 32 (x4 gradient accumulation = 128 effective)
  • Learning Rate: 2e-5
  • Final Val Loss: 0.1458

Training Data Composition

Source Pairs %
Foundation (990-PF) 292,401 90.1%
NIH 25,717 7.9%
NSF 6,361 2.0%

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ArkMaster123/grantpilot-embedding-v2", trust_remote_code=True)

org_text = "Organization: Ford Foundation\nLocation: New York, NY\nType: FOUNDATION"
grant_text = "Grant: Support for civil society organizations\nAmount: $500,000"

embeddings = model.encode([org_text, grant_text])
similarity = embeddings[0] @ embeddings[1]

Related Models

Model Description
grantpilot-embedding V1 — federal-only, beats OpenAI on retrieval
grantpilot-classifier V1 — federal-only classifier (AUC 0.837)
grantpilot-classifier-v2 V2 — combined classifier (AUC 0.997)
grantpilot-training-data Training data (V1 at training/, V2 at training_v2/)