--- language: - en - multilingual license: apache-2.0 tags: - cross-encoder - reranker - sentence-transformers - ror - affiliation-matching base_model: cross-encoder/ms-marco-MiniLM-L-12-v2 datasets: - cometadata/ror-pipeline-traces pipeline_tag: text-classification --- # ms-marco-ror-reranker A cross-encoder reranker fine-tuned for Research Organization Registry (ROR) affiliation matching. ## Model Description This model is fine-tuned from `cross-encoder/ms-marco-MiniLM-L-12-v2` on ROR affiliation matching data. It reranks candidate ROR organizations given an affiliation string query. ## Training - **Base model**: cross-encoder/ms-marco-MiniLM-L-12-v2 - **Training examples**: 45,061 - **Training traces**: 2,004 - **Negative sampling**: Hard negatives from retrieval candidates - **Epochs**: 3 - **Batch size**: 16 - **Learning rate**: 2e-05 - **Max sequence length**: 256 ## Usage ```python from sentence_transformers import CrossEncoder model = CrossEncoder("cometadata/ms-marco-ror-reranker") # Score affiliation-candidate pairs pairs = [ ["University of California, Berkeley", "University of California, Berkeley"], ["University of California, Berkeley", "University of California, Los Angeles"], ] scores = model.predict(pairs) print(scores) # Higher score = better match ``` ## Intended Use This model is designed for reranking ROR organization candidates in affiliation matching pipelines. It should be used after an initial retrieval step (e.g., dense retrieval with Snowflake Arctic). ## Training Data Trained on traces from `cometadata/ror-pipeline-traces` (affrodb_s2aff_traces config). ## Timestamp 2026-01-07T21:35:26.376404+00:00