|
|
--- |
|
|
language: |
|
|
- en |
|
|
- multilingual |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- cross-encoder |
|
|
- reranker |
|
|
- sentence-transformers |
|
|
- ror |
|
|
- affiliation-matching |
|
|
base_model: cross-encoder/ms-marco-MiniLM-L-12-v2 |
|
|
datasets: |
|
|
- cometadata/ror-pipeline-traces |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# ms-marco-ror-reranker |
|
|
|
|
|
A cross-encoder reranker fine-tuned for Research Organization Registry (ROR) affiliation matching. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is fine-tuned from `cross-encoder/ms-marco-MiniLM-L-12-v2` on ROR affiliation matching data. |
|
|
It reranks candidate ROR organizations given an affiliation string query. |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base model**: cross-encoder/ms-marco-MiniLM-L-12-v2 |
|
|
- **Training examples**: 45,061 |
|
|
- **Training traces**: 2,004 |
|
|
- **Negative sampling**: Hard negatives from retrieval candidates |
|
|
- **Epochs**: 3 |
|
|
- **Batch size**: 16 |
|
|
- **Learning rate**: 2e-05 |
|
|
- **Max sequence length**: 256 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
model = CrossEncoder("cometadata/ms-marco-ror-reranker") |
|
|
|
|
|
# Score affiliation-candidate pairs |
|
|
pairs = [ |
|
|
["University of California, Berkeley", "University of California, Berkeley"], |
|
|
["University of California, Berkeley", "University of California, Los Angeles"], |
|
|
] |
|
|
scores = model.predict(pairs) |
|
|
print(scores) # Higher score = better match |
|
|
``` |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is designed for reranking ROR organization candidates in affiliation matching pipelines. |
|
|
It should be used after an initial retrieval step (e.g., dense retrieval with Snowflake Arctic). |
|
|
|
|
|
## Training Data |
|
|
|
|
|
Trained on traces from `cometadata/ror-pipeline-traces` (affrodb_s2aff_traces config). |
|
|
|
|
|
## Timestamp |
|
|
|
|
|
2026-01-07T21:35:26.376404+00:00 |
|
|
|