ArkMaster123
/

grantpilot-embedding-v2

@@ -1,695 +1,121 @@
 ---
 tags:
-- sentence-transformers
-- sentence-similarity
-- feature-extraction
-- dense
-- generated_from_trainer
-- dataset_size:324479
-- loss:MultipleNegativesRankingLoss
 base_model: Qwen/Qwen3-Embedding-0.6B
-widget:
-- source_sentence: 'Organization: PKF O''CONNOR DAVIES ADVISORY LLC
-    Location: NEW YORK, NY
-    Type: FOUNDATION'
-  sentences:
-  - 'Grant: Grant to OCEANA INC
-    Funder: PKF O''CONNOR DAVIES ADVISORY LLC (FOUNDATION)
-    Amount: $150,000
-    Description: Purpose: TO SUPPORT OCEANA''S WORK IN THE UK
-    Recipient Location: WASHINGTON, DC
-    Recipient Type: Public Charity
-    Amount: $150,000'
-  - 'Grant: Grant to RAINFOREST FOUNDATION INC
-    Funder: BPM LLP (FOUNDATION)
-    Amount: $100,000
-    Description: Purpose: RAPID RESPONSE ADDRESSING THE NEEDS OF COMMUNITIES AFFECTED
-    BY THE FIRES IN BELIZE.
-    Recipient Location: BROOKLYN, NY
-    Recipient Type: Public Charity
-    Amount: $100,000'
-  - 'Grant: Grant to ALONDRA ALVAREZ MURILLO
-    Funder: REDWITZ INC (FOUNDATION)
-    Amount: $300
-    Description: Purpose: TEACHER GRATITUDE GRANT
-    Recipient Location: EL CERRITO, CA
-    Recipient Type: EDUCATIONAL INSTITUT
-    Amount: $300'
-- source_sentence: 'Organization: Forvis Mazars LLP
-    Location: Asheville, NC
-    Type: FOUNDATION'
-  sentences:
-  - 'Grant: Grant to Globe Santa - The Boston Globe Foundation
-    Funder: Forvis Mazars LLP (FOUNDATION)
-    Amount: $2,000
-    Description: Purpose: To provide general support
-    Recipient Location: Boston, MA
-    Recipient Type: Public Charity
-    Amount: $2,000'
-  - 'Grant: Grant to TRIBAL ECO RESTORATION ALLIANCE
-    Funder: Foundation Source (FOUNDATION)
-    Amount: $20,000
-    Description: Purpose: General &amp; Unrestricted
-    Recipient Location: UPPER LAKE, CA
-    Recipient Type: Public Charity
-    Amount: $20,000'
-  - 'Grant: Assessing the spatial and temporal scales of attention effects and attention-dependent
-    cholinergic release in macque V4.
-    Funder: National Eye Institute (FEDERAL)
-    Amount: $41,749
-    Description: Explicitly or implicitly, there are currently three competing models
-    for the role of the neuromodulator acetylcholine (ACh) in attention. The first
-    asserts that the cholinergic system is spatially imprecise and contributes to
-    a mechanism for arousal but not attention. The second states that the cholinergic
-    system is spatially imprecise and is one component of the mechanism for attention.
-    The third states that the cholinergic system is at the center of the mechanism
-    for attention (implying the sy...'
-- source_sentence: 'Organization: WITHUMSMITHBROWNPC
-    Location: NEW YORK, NY
-    Type: FOUNDATION'
-  sentences:
-  - 'Grant: Grant to XERCES SOCIETY INC
-    Funder: WEAVER AND TIDWELL LLP (FOUNDATION)
-    Amount: $200
-    Description: Purpose: TO FURTHER THE ORGANIZATIONS CHARITABLE OBJECTIVES
-    Recipient Location: NEW YORK, NY
-    Recipient Type: EXEMPT
-    Amount: $200'
-  - 'Grant: Grant to NOOGA QUEEN BEE COOPERATIVE
-    Funder: HEMENWAY &amp; BARNES LLP (FOUNDATION)
-    Amount: $1,528
-    Description: Purpose: FURTHERING EDUCATION WITH RESPECT TO SCIENCE POLICY AND
-    BEEKEEPING.
-    Recipient Location: CHATTANOOGA, TN
-    Recipient Type: Non-Charity
-    Amount: $1,528'
-  - 'Grant: Grant to Institute for Ag &amp; Trade Policy
-    Funder: WITHUMSMITHBROWNPC (FOUNDATION)
-    Amount: $30,000
-    Description: Purpose: Transform Food Systems
-    Recipient Location: Minneapolis, MN
-    Recipient Type: Public Charity
-    Amount: $30,000'
-- source_sentence: 'Organization: GRANT THORNTON ADVISORS LLC
-    Location: BOSTON, MA
-    Type: FOUNDATION'
-  sentences:
-  - 'Grant: Grant to SAN JUAN ROTARY FOUNDATION INC
-    Funder: PKF O''CONNOR DAVIES ADVISORY LLC (FOUNDATION)
-    Amount: $2,000
-    Description: Purpose: VOLUNTEER INCENTIVE PROGRAM
-    Recipient Location: FARMINGTON, NM
-    Recipient Type: Public Charity
-    Amount: $2,000'
-  - 'Grant: Grant to BROWN UNIVERSITY
-    Funder: GRANT THORNTON ADVISORS LLC (FOUNDATION)
-    Amount: $400
-    Description: Purpose: FIDELITY MATCHING GIFTS TO EDUCATION
-    Recipient Location: PROVIDENCE, RI
-    Recipient Type: Public Charity
-    Amount: $400'
-  - 'Grant: Experimental Study of a Model to Support Research Evidence Use for Protecting
-    Children
-    Funder: Eunice Kennedy Shriver National Institute of Child Health and Human Development
-    (FEDERAL)
-    Amount: $689,752
-    Description: Project Summary Protecting children through the primary prevention
-    of child abuse and neglect (CAN) is a major priority given that an estimated 1
-    in 7 children are affected each year in the U.S. and the societal cost of CAN
-    is of over $400 billion. Even though there are many evidence-based programs to
-    prevent abuse, reduce harm, and treat trauma, there remain numerous barriers for
-    policymakers to craft scientifically-informed policies to protect children. Accordingly,
-    we propose an experimental ...'
-- source_sentence: 'Organization: WITHUMSMITHBROWNPC
-    Location: IRVINE, CA
-    Type: FOUNDATION'
-  sentences:
-  - 'Grant: Grant to CENTER FOR LEADERSHIP DEVELOPMENT
-    Funder: BGBC ADVISORY LLC (FOUNDATION)
-    Amount: $1,000
-    Description: Purpose: TO FOSTER THE ADVANCEMENT OF MINORITY YOUTH IN CENTRAL INDIANA
-    AS FUTURE PROFESSIONAL, BUSINESS AND COMMUNITY LEADERS BY PROVIDING EXPERIENCES
-    THAT ENCOURAGE PERSONAL DEVELOPMENT AND EDUCATIONAL ATTAINMENT.
-    Recipient Location: INDIANAPOLIS, IN
-    Recipient Type: PUBLIC CHARITY
-    Amount: $1,000'
-  - 'Grant: Grant to Santa Barbara Botanic Garden
-    Funder: WITHUMSMITHBROWNPC (FOUNDATION)
-    Amount: $2,150
-    Description: Purpose: TO FURTHER THE AGENDA OF THE ORGANIZATION.
-    Recipient Location: Santa Barbara, CA
-    Recipient Type: Public Charity
-    Amount: $2,150'
-  - 'Grant: Grant to INTERNATIONAL RESCUE COMMITTEE INC
-    Funder: CLARK NUBER PS (FOUNDATION)
-    Amount: $200,000
-    Description: Purpose: ENSURING THE RIGHT TO HUMANITARIAN ASSISTANCE IN EAST AFRICA
-    Recipient Location: NEW YORK, NY
-    Recipient Type: Public Charity
-    Amount: $200,000'
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
-metrics:
-- pearson_cosine
-- spearman_cosine
-model-index:
-- name: SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B
-  results:
-  - task:
-      type: semantic-similarity
-      name: Semantic Similarity
-    dataset:
-      name: val similarity
-      type: val-similarity
-    metrics:
-    - type: pearson_cosine
-      value: .nan
-      name: Pearson Cosine
-    - type: spearman_cosine
-      value: .nan
-      name: Spearman Cosine
 ---
-# SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
-## Model Details
-### Model Description
-- **Model Type:** Sentence Transformer
-- **Base model:** [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) <!-- at revision c54f2e6e80b2d7b7de06f51cec4959f6b3e03418 -->
-- **Maximum Sequence Length:** 512 tokens
-- **Output Dimensionality:** 1024 dimensions
-- **Similarity Function:** Cosine Similarity
-<!-- - **Training Dataset:** Unknown -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
-### Model Sources
-- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
-- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
-- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
-### Full Model Architecture
-```
-SentenceTransformer(
-  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
-  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
-  (2): Normalize()
-)
-```
-## Usage
-### Direct Usage (Sentence Transformers)
-First install the Sentence Transformers library:
-```bash
-pip install -U sentence-transformers
-```
-Then you can load this model and run inference.
-```python
-from sentence_transformers import SentenceTransformer
-# Download from the 🤗 Hub
-model = SentenceTransformer("sentence_transformers_model_id")
-# Run inference
-queries = [
-    "Organization: WITHUMSMITHBROWNPC\nLocation: IRVINE, CA\nType: FOUNDATION",
-]
-documents = [
-    'Grant: Grant to Santa Barbara Botanic Garden\nFunder: WITHUMSMITHBROWNPC (FOUNDATION)\nAmount: $2,150\nDescription: Purpose: TO FURTHER THE AGENDA OF THE ORGANIZATION.\nRecipient Location: Santa Barbara, CA\nRecipient Type: Public Charity\nAmount: $2,150',
-    'Grant: Grant to INTERNATIONAL RESCUE COMMITTEE INC\nFunder: CLARK NUBER PS (FOUNDATION)\nAmount: $200,000\nDescription: Purpose: ENSURING THE RIGHT TO HUMANITARIAN ASSISTANCE IN EAST AFRICA\nRecipient Location: NEW YORK, NY\nRecipient Type: Public Charity\nAmount: $200,000',
-    'Grant: Grant to CENTER FOR LEADERSHIP DEVELOPMENT\nFunder: BGBC ADVISORY LLC (FOUNDATION)\nAmount: $1,000\nDescription: Purpose: TO FOSTER THE ADVANCEMENT OF MINORITY YOUTH IN CENTRAL INDIANA AS FUTURE PROFESSIONAL, BUSINESS AND COMMUNITY LEADERS BY PROVIDING EXPERIENCES THAT ENCOURAGE PERSONAL DEVELOPMENT AND EDUCATIONAL ATTAINMENT.\nRecipient Location: INDIANAPOLIS, IN\nRecipient Type: PUBLIC CHARITY\nAmount: $1,000',
-]
-query_embeddings = model.encode_query(queries)
-document_embeddings = model.encode_document(documents)
-print(query_embeddings.shape, document_embeddings.shape)
-# [1, 1024] [3, 1024]
-# Get the similarity scores for the embeddings
-similarities = model.similarity(query_embeddings, document_embeddings)
-print(similarities)
-# tensor([[0.7437, 0.0331, 0.0600]])
-```
-<!--
-### Direct Usage (Transformers)
-<details><summary>Click to see the direct usage in Transformers</summary>
-</details>
--->
-<!--
-### Downstream Usage (Sentence Transformers)
-You can finetune this model on your own dataset.
-<details><summary>Click to expand</summary>
-</details>
--->
-<!--
-### Out-of-Scope Use
-*List how the model may foreseeably be misused and address what users ought not to do with the model.*
--->
-## Evaluation
-### Metrics
-#### Semantic Similarity
-* Dataset: `val-similarity`
-* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
-| Metric              | Value   |
-|:--------------------|:--------|
-| pearson_cosine      | nan     |
-| **spearman_cosine** | **nan** |
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
 ## Training Details
-### Training Dataset
-#### Unnamed Dataset
-* Size: 324,479 training samples
-* Columns: <code>anchor</code> and <code>positive</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | anchor                                                                             | positive                                                                           |
-  |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
-  | type    | string                                                                             | string                                                                             |
-  | details | <ul><li>min: 16 tokens</li><li>mean: 23.39 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 46 tokens</li><li>mean: 83.4 tokens</li><li>max: 192 tokens</li></ul> |
-* Samples:
-  | anchor                                                                                                      | positive                                                                                                                                                                                                                                                                  |
-  |:------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>Organization: DELOITTE TAX LLP<br>Location: MINNEAPOLIS, MN<br>Type: FOUNDATION</code>                | <code>Grant: Grant to WORLD HEALTH ORGANIZATION<br>Funder: DELOITTE TAX LLP (FOUNDATION)<br>Amount: $450,000<br>Description: Purpose: RESEARCH AND LEARNING OPPORTUNITIES<br>Recipient Type: GOV: EXECUTIVE ORDER<br>Amount: $450,000</code>                              |
-  | <code>Organization: Berry Dunn McNeil &amp; Parker LLC<br>Location: Portland, ME<br>Type: FOUNDATION</code> | <code>Grant: Grant to Museum of Fine Arts<br>Funder: Berry Dunn McNeil &amp; Parker LLC (FOUNDATION)<br>Amount: $3,000<br>Description: Purpose: Operations budget assistance<br>Recipient Location: Boston, MA<br>Recipient Type: Public Charity<br>Amount: $3,000</code> |
-  | <code>Organization: Aprio Advisory Group LLC<br>Location: Greenwood Village, CO<br>Type: FOUNDATION</code>  | <code>Grant: Grant to Safehouse Denver Inc<br>Funder: Aprio Advisory Group LLC (FOUNDATION)<br>Amount: $5,000<br>Description: Purpose: Survivors of domestic violence<br>Recipient Location: Denver, CO<br>Recipient Type: Public<br>Amount: $5,000</code>                |
-* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
-  ```json
-  {
-      "scale": 20.0,
-      "similarity_fct": "cos_sim",
-      "gather_across_devices": false
-  }
-  ```
-### Evaluation Dataset
-#### Unnamed Dataset
-* Size: 40,559 evaluation samples
-* Columns: <code>anchor</code> and <code>positive</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | anchor                                                                             | positive                                                                            |
-  |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
-  | type    | string                                                                             | string                                                                              |
-  | details | <ul><li>min: 16 tokens</li><li>mean: 23.62 tokens</li><li>max: 37 tokens</li></ul> | <ul><li>min: 47 tokens</li><li>mean: 83.31 tokens</li><li>max: 191 tokens</li></ul> |
-* Samples:
-  | anchor                                                                                                    | positive                                                                                                                                                                                                                                                                                                                                                                                            |
-  |:----------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>Organization: O'CONNOR MALONEY &amp; CO CPA'S<br>Location: WORCESTER, MA<br>Type: FOUNDATION</code> | <code>Grant: Grant to NIKOLAS KOJOIAN<br>Funder: O'CONNOR MALONEY &amp; CO CPA'S (FOUNDATION)<br>Amount: $3,500<br>Description: Purpose: EDUCATIONAL SCHOLARSHIP<br>Recipient Location: NORTH ATTLEBORO, MA<br>Recipient Type: I<br>Amount: $3,500</code>                                                                                                                                           |
-  | <code>Organization: WALTON ENTERPRISES LLC<br>Location: BENTONVILLE, AR<br>Type: FOUNDATION</code>        | <code>Grant: Grant to Student Achievement Partners Inc<br>Funder: WALTON ENTERPRISES LLC (FOUNDATION)<br>Amount: $429,272<br>Description: Purpose: To develop and disseminate high-quality math and literacy instructional materials to educators and publishers that accelerate student learning.<br>Recipient Location: New York, NY<br>Recipient Type: Public Charity<br>Amount: $429,272</code> |
-  | <code>Organization: FRAZIER &amp; FRAZIER ATTYS<br>Location: Jacksonville, FL<br>Type: FOUNDATION</code>  | <code>Grant: Grant to Cathedral Arts Project<br>Funder: FRAZIER &amp; FRAZIER ATTYS (FOUNDATION)<br>Amount: $2,500<br>Description: Purpose: To provide unrestricted general operating support to fulfill their mission<br>Recipient Location: Jacksonville, FL<br>Recipient Type: Public Charity<br>Amount: $2,500</code>                                                                           |
-* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
-  ```json
-  {
-      "scale": 20.0,
-      "similarity_fct": "cos_sim",
-      "gather_across_devices": false
-  }
-  ```
-### Training Hyperparameters
-#### Non-Default Hyperparameters
-- `per_device_train_batch_size`: 32
-- `num_train_epochs`: 1
-- `max_steps`: 1000
-- `learning_rate`: 2e-05
-- `warmup_steps`: 0.1
-- `weight_decay`: 0.01
-- `gradient_accumulation_steps`: 4
-- `fp16`: True
-- `eval_strategy`: steps
-- `per_device_eval_batch_size`: 32
-- `dataloader_num_workers`: 4
-- `warmup_ratio`: 0.1
-- `batch_sampler`: no_duplicates
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
-- `per_device_train_batch_size`: 32
-- `num_train_epochs`: 1
-- `max_steps`: 1000
-- `learning_rate`: 2e-05
-- `lr_scheduler_type`: linear
-- `lr_scheduler_kwargs`: None
-- `warmup_steps`: 0.1
-- `optim`: adamw_torch_fused
-- `optim_args`: None
-- `weight_decay`: 0.01
-- `adam_beta1`: 0.9
-- `adam_beta2`: 0.999
-- `adam_epsilon`: 1e-08
-- `optim_target_modules`: None
-- `gradient_accumulation_steps`: 4
-- `average_tokens_across_devices`: True
-- `max_grad_norm`: 1.0
-- `label_smoothing_factor`: 0.0
-- `bf16`: False
-- `fp16`: True
-- `bf16_full_eval`: False
-- `fp16_full_eval`: False
-- `tf32`: None
-- `gradient_checkpointing`: False
-- `gradient_checkpointing_kwargs`: None
-- `torch_compile`: False
-- `torch_compile_backend`: None
-- `torch_compile_mode`: None
-- `use_liger_kernel`: False
-- `liger_kernel_config`: None
-- `use_cache`: False
-- `neftune_noise_alpha`: None
-- `torch_empty_cache_steps`: None
-- `auto_find_batch_size`: False
-- `log_on_each_node`: True
-- `logging_nan_inf_filter`: True
-- `include_num_input_tokens_seen`: no
-- `log_level`: passive
-- `log_level_replica`: warning
-- `disable_tqdm`: False
-- `project`: huggingface
-- `trackio_space_id`: trackio
-- `eval_strategy`: steps
-- `per_device_eval_batch_size`: 32
-- `prediction_loss_only`: True
-- `eval_on_start`: False
-- `eval_do_concat_batches`: True
-- `eval_use_gather_object`: False
-- `eval_accumulation_steps`: None
-- `include_for_metrics`: []
-- `batch_eval_metrics`: False
-- `save_only_model`: False
-- `save_on_each_node`: False
-- `enable_jit_checkpoint`: False
-- `push_to_hub`: False
-- `hub_private_repo`: None
-- `hub_model_id`: None
-- `hub_strategy`: every_save
-- `hub_always_push`: False
-- `hub_revision`: None
-- `load_best_model_at_end`: False
-- `ignore_data_skip`: False
-- `restore_callback_states_from_checkpoint`: False
-- `full_determinism`: False
-- `seed`: 42
-- `data_seed`: None
-- `use_cpu`: False
-- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
-- `parallelism_config`: None
-- `dataloader_drop_last`: False
-- `dataloader_num_workers`: 4
-- `dataloader_pin_memory`: True
-- `dataloader_persistent_workers`: False
-- `dataloader_prefetch_factor`: None
-- `remove_unused_columns`: True
-- `label_names`: None
-- `train_sampling_strategy`: random
-- `length_column_name`: length
-- `ddp_find_unused_parameters`: None
-- `ddp_bucket_cap_mb`: None
-- `ddp_broadcast_buffers`: False
-- `ddp_backend`: None
-- `ddp_timeout`: 1800
-- `fsdp`: []
-- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
-- `deepspeed`: None
-- `debug`: []
-- `skip_memory_metrics`: True
-- `do_predict`: False
-- `resume_from_checkpoint`: None
-- `warmup_ratio`: 0.1
-- `local_rank`: -1
-- `prompts`: None
-- `batch_sampler`: no_duplicates
-- `multi_dataset_batch_sampler`: proportional
-- `router_mapping`: {}
-- `learning_rate_mapping`: {}
-</details>
-### Training Logs
-| Epoch  | Step | Training Loss | Validation Loss | val-similarity_spearman_cosine |
-|:------:|:----:|:-------------:|:---------------:|:------------------------------:|
-| 0.0099 | 25   | 1.7643        | -               | -                              |
-| 0.0197 | 50   | 1.0715        | -               | -                              |
-| 0.0296 | 75   | 0.4669        | -               | -                              |
-| 0.0394 | 100  | 0.3204        | 0.2283          | nan                            |
-| 0.0493 | 125  | 0.3101        | -               | -                              |
-| 0.0592 | 150  | 0.2830        | -               | -                              |
-| 0.0690 | 175  | 0.3010        | -               | -                              |
-| 0.0789 | 200  | 0.2790        | 0.2096          | nan                            |
-| 0.0888 | 225  | 0.2919        | -               | -                              |
-| 0.0986 | 250  | 0.2608        | -               | -                              |
-| 0.1085 | 275  | 0.2796        | -               | -                              |
-| 0.1183 | 300  | 0.2559        | 0.1940          | nan                            |
-| 0.1282 | 325  | 0.2376        | -               | -                              |
-| 0.1381 | 350  | 0.2491        | -               | -                              |
-| 0.1479 | 375  | 0.2307        | -               | -                              |
-| 0.1578 | 400  | 0.2233        | 0.1824          | nan                            |
-| 0.1677 | 425  | 0.2385        | -               | -                              |
-| 0.1775 | 450  | 0.2356        | -               | -                              |
-| 0.1874 | 475  | 0.2295        | -               | -                              |
-| 0.1972 | 500  | 0.2104        | 0.1721          | nan                            |
-| 0.2071 | 525  | 0.2117        | -               | -                              |
-| 0.2170 | 550  | 0.2100        | -               | -                              |
-| 0.2268 | 575  | 0.2462        | -               | -                              |
-| 0.2367 | 600  | 0.2402        | 0.1648          | nan                            |
-| 0.2465 | 625  | 0.1954        | -               | -                              |
-| 0.2564 | 650  | 0.1890        | -               | -                              |
-| 0.2663 | 675  | 0.2182        | -               | -                              |
-| 0.2761 | 700  | 0.1878        | 0.1590          | nan                            |
-| 0.2860 | 725  | 0.2252        | -               | -                              |
-| 0.2959 | 750  | 0.1886        | -               | -                              |
-| 0.3057 | 775  | 0.1879        | -               | -                              |
-| 0.3156 | 800  | 0.2009        | 0.1516          | nan                            |
-| 0.3254 | 825  | 0.1880        | -               | -                              |
-| 0.3353 | 850  | 0.1872        | -               | -                              |
-| 0.3452 | 875  | 0.1973        | -               | -                              |
-| 0.3550 | 900  | 0.1944        | 0.1474          | nan                            |
-| 0.3649 | 925  | 0.1960        | -               | -                              |
-| 0.3748 | 950  | 0.1993        | -               | -                              |
-| 0.3846 | 975  | 0.1891        | -               | -                              |
-| 0.3945 | 1000 | 0.1971        | 0.1458          | nan                            |
-### Framework Versions
-- Python: 3.11.12
-- Sentence Transformers: 5.2.3
-- Transformers: 5.2.0
-- PyTorch: 2.10.0+cu128
-- Accelerate: 1.12.0
-- Datasets: 4.6.0
-- Tokenizers: 0.22.2
-## Citation
-### BibTeX
-#### Sentence Transformers
-```bibtex
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    month = "11",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "https://arxiv.org/abs/1908.10084",
-}
-```
-#### MultipleNegativesRankingLoss
-```bibtex
-@misc{henderson2017efficient,
-    title={Efficient Natural Language Response Suggestion for Smart Reply},
-    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
-    year={2017},
-    eprint={1705.00652},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
-## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->
----
-## V2.0 Update: Foundation Grants Support (February 2026)
-### What Changed
-V2 retrains the embedding model on **combined federal + foundation data**. The training set grew from federal-only pairs to **324,479 positive pairs** spanning NIH, NSF, and 37,684 private foundations.
-The model now understands the semantic relationship between:
-- **Federal grants**: Organization research profiles matched to NIH/NSF funding opportunities
-- **Foundation grants**: Foundation profiles matched to their actual grantmaking (recipient, purpose, amount)
-### Training Details (V2)
-- **Hardware**: NVIDIA H100 80GB HBM3
-- **Training Steps**: 1,000 (LoRA fine-tuning)
-- **Base Model**: Qwen/Qwen3-Embedding-0.6B
-- **LoRA Config**: r=16, alpha=32, target=q/k/v/o projections
-- **Effective Batch Size**: 128 (32 x 4 gradient accumulation)
-- **Final Validation Loss**: 0.1458 (steadily decreasing from 0.2283)
-### Downstream Impact
-When used as the similarity feature for the XGBoost classifier:
-| Metric | V1 (Federal Only) | V2 (Combined) |
-|--------|-------------------|---------------|
-| Overall AUC | 0.837 | **0.997** |
-| Federal AUC | 0.837 | **0.913** |
-The foundation-aware embeddings improved performance across the board, including on federal-only test data.
-### Version Tags
-- `v1.0-federal-only`: Trained on NIH + NSF data only
-- `v2.0-with-foundations`: Trained on NIH + NSF + 37K foundation grants

 ---
+license: apache-2.0
 tags:
+  - sentence-transformers
+  - sentence-similarity
+  - feature-extraction
+  - grant-matching
+  - nonprofit
+  - foundation-grants
 base_model: Qwen/Qwen3-Embedding-0.6B
+datasets:
+  - ArkMaster123/grantpilot-training-data
+language:
+  - en
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
+# GrantPilot Embedding V2 (Federal + Foundation)
+Fine-tuned [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) for grant-organization semantic matching. V2 extends coverage from federal-only (NIH/NSF) to include **37,684 private foundations**.
+> **See also:** [V1 (federal-only)](https://huggingface.co/ArkMaster123/grantpilot-embedding) which outperforms OpenAI on federal grant retrieval.
+## Embedding Benchmark Results
+Benchmarked on 998 test pairs (901 foundation, 78 NIH, 19 NSF) using retrieval and classification metrics.
+### Retrieval Quality
+| Model | Dim | R@1 | R@5 | R@10 | MRR | NDCG@10 |
+|-------|-----|-----|-----|------|-----|---------|
+| OpenAI text-embedding-3-small | 1536 | **0.343** | **0.570** | **0.682** | **0.453** | **0.499** |
+| Qwen3-Embedding-0.6B (base) | 1024 | 0.295 | 0.514 | 0.630 | 0.403 | 0.449 |
+| **GrantPilot V2 (this model)** | 1024 | 0.295 | 0.516 | 0.622 | 0.403 | 0.446 |
+**Verdict: OpenAI wins on retrieval.** The fine-tuned V2 embedding performs on par with the base Qwen3 model — fine-tuning did not meaningfully improve retrieval on this mixed dataset. V1 (federal-only) significantly outperformed OpenAI on federal retrieval, but adding 90% foundation data diluted that specialization.
+### AUC as Classifier Feature
+| Model | Overall AUC | Foundation AUC | NIH AUC | NSF AUC |
+|-------|-------------|----------------|---------|---------|
+| OpenAI text-embedding-3-small | **0.886** | **0.972** | 0.473 | 0.524 |
+| Qwen3-Embedding-0.6B (base) | 0.881 | 0.965 | 0.611 | 0.548 |
+| **GrantPilot V2 (this model)** | 0.881 | 0.965 | **0.614** | 0.548 |
+Interesting: OpenAI has the best overall AUC but **worst federal AUC** (0.47 on NIH — worse than random). Our fine-tuned model is best on federal grants.
+### Inference Latency
+| Model | Avg Latency | Cost |
+|-------|-------------|------|
+| OpenAI text-embedding-3-small | 43.9ms | API cost |
+| Qwen3-Embedding-0.6B (base) | 2.9ms | Free (self-hosted) |
+| **GrantPilot V2 (this model)** | **1.7ms** | Free (self-hosted) |
+**25x faster than OpenAI** with zero API cost.
+### Comparison with V1
+| Metric | V1 vs OpenAI | V2 vs OpenAI |
+|--------|-------------|-------------|
+| R@1 | **V1 wins (+46%)** | OpenAI wins |
+| R@5 | **V1 wins (+22%)** | OpenAI wins |
+| R@10 | **V1 wins (+28%)** | OpenAI wins |
+V1 beat OpenAI decisively on federal grants. V2 lost that edge by training on a dataset that is 90% foundation data.
+## Why Use This Model?
+The embedding alone is not the star — the **XGBoost classifier built on top** is where the real value comes from:
+| Classifier Metric | V1 | V2 |
+|-------------------|----|----|
+| Overall AUC | 0.837 | **0.997** |
+| Federal AUC | 0.837 | **0.913** |
+| Accuracy | 72.1% | **98.3%** |
+| F1 | 0.595 | **0.983** |
+See: [grantpilot-classifier-v2](https://huggingface.co/ArkMaster123/grantpilot-classifier-v2)
 ## Training Details
+- **Hardware**: NVIDIA H100 80GB
+- **Training Steps**: 1,000 (LoRA fine-tuning)
+- **Training Pairs**: 324,479 positive pairs
+- **LoRA Config**: r=16, alpha=32, target=q/k/v/o projections
+- **Batch Size**: 32 (x4 gradient accumulation = 128 effective)
+- **Learning Rate**: 2e-5
+- **Final Val Loss**: 0.1458
+### Training Data Composition
+| Source | Pairs | % |
+|--------|-------|---|
+| Foundation (990-PF) | 292,401 | 90.1% |
+| NIH | 25,717 | 7.9% |
+| NSF | 6,361 | 2.0% |
+## Usage
+```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer("ArkMaster123/grantpilot-embedding-v2", trust_remote_code=True)
+org_text = "Organization: Ford Foundation\nLocation: New York, NY\nType: FOUNDATION"
+grant_text = "Grant: Support for civil society organizations\nAmount: $500,000"
+embeddings = model.encode([org_text, grant_text])
+similarity = embeddings[0] @ embeddings[1]
+```
+## Related Models
+| Model | Description |
+|-------|-------------|
+| [grantpilot-embedding](https://huggingface.co/ArkMaster123/grantpilot-embedding) | V1 — federal-only, beats OpenAI on retrieval |
+| [grantpilot-classifier](https://huggingface.co/ArkMaster123/grantpilot-classifier) | V1 — federal-only classifier (AUC 0.837) |
+| [grantpilot-classifier-v2](https://huggingface.co/ArkMaster123/grantpilot-classifier-v2) | V2 — combined classifier (AUC 0.997) |
+| [grantpilot-training-data](https://huggingface.co/datasets/ArkMaster123/grantpilot-training-data) | Training data (V1 at training/, V2 at training_v2/) |