adambuttrick's picture
Upload folder using huggingface_hub
4a9ddc6 verified
---
language:
- en
- multilingual
license: apache-2.0
tags:
- cross-encoder
- reranker
- sentence-transformers
- ror
- affiliation-matching
base_model: cross-encoder/ms-marco-MiniLM-L-12-v2
datasets:
- cometadata/ror-pipeline-traces
pipeline_tag: text-classification
---
# ms-marco-ror-reranker
A cross-encoder reranker fine-tuned for Research Organization Registry (ROR) affiliation matching.
## Model Description
This model is fine-tuned from `cross-encoder/ms-marco-MiniLM-L-12-v2` on ROR affiliation matching data.
It reranks candidate ROR organizations given an affiliation string query.
## Training
- **Base model**: cross-encoder/ms-marco-MiniLM-L-12-v2
- **Training examples**: 45,061
- **Training traces**: 2,004
- **Negative sampling**: Hard negatives from retrieval candidates
- **Epochs**: 3
- **Batch size**: 16
- **Learning rate**: 2e-05
- **Max sequence length**: 256
## Usage
```python
from sentence_transformers import CrossEncoder
model = CrossEncoder("cometadata/ms-marco-ror-reranker")
# Score affiliation-candidate pairs
pairs = [
["University of California, Berkeley", "University of California, Berkeley"],
["University of California, Berkeley", "University of California, Los Angeles"],
]
scores = model.predict(pairs)
print(scores) # Higher score = better match
```
## Intended Use
This model is designed for reranking ROR organization candidates in affiliation matching pipelines.
It should be used after an initial retrieval step (e.g., dense retrieval with Snowflake Arctic).
## Training Data
Trained on traces from `cometadata/ror-pipeline-traces` (affrodb_s2aff_traces config).
## Timestamp
2026-01-07T21:35:26.376404+00:00