| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: sentence-transformers |
| | tags: |
| | - sentence-transformers |
| | - cross-encoder |
| | - text-classification |
| | - transformers |
| | - modernbert |
| | - biomedical |
| | - systematic-review |
| | - relevance-screening |
| | - reranking |
| | - pubmed |
| | datasets: |
| | - Praise2112/siren-screening |
| | base_model: |
| | - Alibaba-NLP/gte-modernbert-base |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | # SIREN Screening Cross-encoder |
| |
|
| | <p align="center"> |
| | <a href="https://huggingface.co/datasets/Praise2112/siren-screening"> |
| | <img src="https://img.shields.io/badge/Dataset-siren--screening-yellow.svg" alt="Dataset"/> |
| | </a> |
| | <a href="https://huggingface.co/Praise2112/siren-screening-biencoder"> |
| | <img src="https://img.shields.io/badge/Retriever-siren--screening--biencoder-blue.svg" alt="Bi-encoder"/> |
| | </a> |
| | <img src="https://img.shields.io/badge/License-Apache_2.0-green.svg" alt="License"/> |
| | </p> |
| | |
| | A **3-class cross-encoder** for systematic review screening that classifies query-document pairs as **Relevant**, **Partial**, or **Irrelevant**. Designed to rerank candidates from the [siren-screening-biencoder](https://huggingface.co/Praise2112/siren-screening-biencoder). |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | Base Model | GTE-reranker-ModernBERT-base | |
| | | Architecture | ModernBertForSequenceClassification (22 layers, 768 hidden) | |
| | | Parameters | ~149M | |
| | | Max Sequence Length | 8192 tokens | |
| | | Output | 3-class probabilities (Irrelevant, Partial, Relevant) | |
| | | Training | Fine-tuned on [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) + SLERP merged (t=0.2) | |
| |
|
| | ### Label Definitions |
| |
|
| | | Label | ID | Definition | |
| | |-------|------|------------| |
| | | **Irrelevant** | 0 | Document matches NONE of the eligibility criteria | |
| | | **Partial** | 1 | Document matches SOME but not ALL criteria | |
| | | **Relevant** | 2 | Document matches ALL criteria | |
| |
|
| | ## Intended Use |
| |
|
| | **Primary use case:** Second-stage reranking in systematic review screening pipelines. |
| |
|
| | After retrieving candidates with a bi-encoder, use this cross-encoder to: |
| | 1. **Rerank** documents for better precision at top ranks |
| | 2. **Classify** relevance for triage (prioritize Relevant, defer Partial, skip Irrelevant) |
| |
|
| | **Recommended pipeline:** |
| | 1. Retrieve top-100 candidates with [siren-screening-biencoder](https://huggingface.co/Praise2112/siren-screening-biencoder) |
| | 2. Rerank with this cross-encoder |
| | 3. Use relevance labels to prioritize human screening |
| |
|
| | ## Usage |
| |
|
| | ### Sentence-Transformers CrossEncoder |
| |
|
| | ```python |
| | from sentence_transformers import CrossEncoder |
| | |
| | model = CrossEncoder("Praise2112/siren-screening-crossencoder") |
| | |
| | # Pairs of (query, document) |
| | pairs = [ |
| | ("RCTs of aspirin in diabetic adults", "A randomized trial of aspirin in 5,000 diabetic patients showed..."), |
| | ("RCTs of aspirin in diabetic adults", "This cohort study examined statin use in elderly populations..."), |
| | ] |
| | |
| | # Get 3-class scores |
| | scores = model.predict(pairs) |
| | print(scores) |
| | # Output: array([[ 0.02, 0.15, 0.83], # Relevant |
| | # [ 0.91, 0.07, 0.02]]) # Irrelevant |
| | ``` |
| |
|
| | ### Transformers (Direct) |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("Praise2112/siren-screening-crossencoder") |
| | model = AutoModelForSequenceClassification.from_pretrained("Praise2112/siren-screening-crossencoder") |
| | |
| | query = "RCTs of aspirin in diabetic adults" |
| | document = "A randomized trial of aspirin in 5,000 diabetic patients showed reduced MI risk..." |
| | |
| | inputs = tokenizer( |
| | query, document, |
| | padding=True, |
| | truncation=True, |
| | max_length=768, |
| | return_tensors="pt" |
| | ) |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | probs = torch.softmax(outputs.logits, dim=-1) |
| | |
| | print(f"Irrelevant: {probs[0, 0]:.3f}") |
| | print(f"Partial: {probs[0, 1]:.3f}") |
| | print(f"Relevant: {probs[0, 2]:.3f}") |
| | |
| | # Get predicted label |
| | label_id = probs.argmax().item() |
| | labels = {0: "Irrelevant", 1: "Partial", 2: "Relevant"} |
| | print(f"Prediction: {labels[label_id]}") |
| | ``` |
| |
|
| | ### Scoring for Reranking |
| |
|
| | For reranking, convert 3-class probabilities to a single score: |
| |
|
| | ```python |
| | def rerank_score(probs): |
| | """Convert 3-class probs to ranking score. |
| | |
| | Higher score = more relevant. |
| | Partial gets partial credit (1x), Relevant gets full credit (2x). |
| | """ |
| | return probs[1] + 2 * probs[2] # P(Partial) + 2 * P(Relevant) |
| | |
| | # Example |
| | probs = [0.02, 0.15, 0.83] # [Irrelevant, Partial, Relevant] |
| | score = rerank_score(probs) # 0.15 + 2 * 0.83 = 1.81 |
| | ``` |
| |
|
| | ## Performance |
| |
|
| | ### Classification Accuracy |
| |
|
| | | Metric | Value | |
| | |--------|-------| |
| | | Accuracy | 90.6% | |
| | | F1 (Macro) | 90.6% | |
| | | Irrelevant F1 | 92.2% | |
| | | Partial F1 | 87.4% | |
| | | Relevant F1 | 92.3% | |
| |
|
| | ### Reranking Impact (MRR@10) |
| |
|
| | | Configuration | MRR@10 | Delta | |
| | |---------------|--------|-------| |
| | | SIREN bi-encoder alone | 0.937 | - | |
| | | + SIREN cross-encoder | **0.952** | +1.5pp | |
| | | + [BGE-reranker](https://huggingface.co/BAAI/bge-reranker-base) (general) | 0.846 | -9.2pp | |
| |
|
| | General-purpose rerankers like [BGE](https://huggingface.co/BAAI/bge-reranker-base) actually hurt performance on screening queries because they're optimized for topical relevance, not criteria matching. |
| |
|
| | ### Cross-encoder Transfer |
| |
|
| | This cross-encoder also improves other retrievers: |
| |
|
| | | Bi-encoder | Cross-encoder | MRR@10 | Delta | |
| | |------------|---------------|--------|-------| |
| | | [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | - | 0.697 | - | |
| | | [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | [MedCPT-CE](https://huggingface.co/ncbi/MedCPT-Cross-Encoder) | 0.826 | +12.9pp | |
| | | [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | **SIREN-CE** | **0.931** | **+23.4pp** | |
| |
|
| | ## Training |
| |
|
| | This model was created by: |
| | 1. Fine-tuning on the [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) dataset with 3-class labels |
| | 2. SLERP merging encoder layers with the base model (t=0.2) to preserve generalization |
| |
|
| | **Training details:** |
| | - Loss: Cross-entropy |
| | - Batch size: 32 (16 x 2 gradient accumulation) |
| | - Learning rate: 2e-5 |
| | - Epochs: 1 |
| | - Max length: 768 tokens |
| |
|
| | ## Limitations |
| |
|
| | - **Synthetic queries, real documents**: The queries and relevance labels are LLM-generated, but the documents are real PubMed articles |
| | - **English only**: Trained on English PubMed content |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{oketola2026siren, |
| | title={SIREN: Improving Systematic Review Screening with Synthetic Training Data for Neural Retrievers}, |
| | author={Praise Oketola}, |
| | year={2026}, |
| | howpublished={\url{https://huggingface.co/Praise2112/siren-screening-crossencoder}}, |
| | note={Cross-encoder model} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|