File size: 1,679 Bytes
00449d0 4a9ddc6 00449d0 4a9ddc6 00449d0 4a9ddc6 00449d0 4a9ddc6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
language:
- en
- multilingual
license: apache-2.0
tags:
- cross-encoder
- reranker
- sentence-transformers
- ror
- affiliation-matching
base_model: cross-encoder/ms-marco-MiniLM-L-12-v2
datasets:
- cometadata/ror-pipeline-traces
pipeline_tag: text-classification
---
# ms-marco-ror-reranker
A cross-encoder reranker fine-tuned for Research Organization Registry (ROR) affiliation matching.
## Model Description
This model is fine-tuned from `cross-encoder/ms-marco-MiniLM-L-12-v2` on ROR affiliation matching data.
It reranks candidate ROR organizations given an affiliation string query.
## Training
- **Base model**: cross-encoder/ms-marco-MiniLM-L-12-v2
- **Training examples**: 45,061
- **Training traces**: 2,004
- **Negative sampling**: Hard negatives from retrieval candidates
- **Epochs**: 3
- **Batch size**: 16
- **Learning rate**: 2e-05
- **Max sequence length**: 256
## Usage
```python
from sentence_transformers import CrossEncoder
model = CrossEncoder("cometadata/ms-marco-ror-reranker")
# Score affiliation-candidate pairs
pairs = [
["University of California, Berkeley", "University of California, Berkeley"],
["University of California, Berkeley", "University of California, Los Angeles"],
]
scores = model.predict(pairs)
print(scores) # Higher score = better match
```
## Intended Use
This model is designed for reranking ROR organization candidates in affiliation matching pipelines.
It should be used after an initial retrieval step (e.g., dense retrieval with Snowflake Arctic).
## Training Data
Trained on traces from `cometadata/ror-pipeline-traces` (affrodb_s2aff_traces config).
## Timestamp
2026-01-07T21:35:26.376404+00:00
|