File size: 6,499 Bytes
73aa061 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | ---
language:
- en
license: apache-2.0
library_name: sentence-transformers
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- modernbert
- biomedical
- systematic-review
- relevance-screening
- information-retrieval
- pubmed
datasets:
- Praise2112/siren-screening
base_model:
- Alibaba-NLP/gte-modernbert-base
pipeline_tag: sentence-similarity
---
# SIREN Screening Bi-encoder
<p align="center">
<a href="https://huggingface.co/datasets/Praise2112/siren-screening">
<img src="https://img.shields.io/badge/Dataset-siren--screening-yellow.svg" alt="Dataset"/>
</a>
<a href="https://huggingface.co/Praise2112/siren-screening-crossencoder">
<img src="https://img.shields.io/badge/Reranker-siren--screening--crossencoder-blue.svg" alt="Cross-encoder"/>
</a>
<img src="https://img.shields.io/badge/License-Apache_2.0-green.svg" alt="License"/>
</p>
A bi-encoder model for **systematic review screening**, trained to retrieve documents matching eligibility criteria. Achieves **+24 percentage points MRR@10** over MedCPT on criteria-based retrieval.
## Model Details
| Property | Value |
|----------|-------|
| Base Model | [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) |
| Architecture | ModernBERT (22 layers, 768 hidden, 12 heads) |
| Parameters | ~149M |
| Max Sequence Length | 8192 tokens |
| Embedding Dimension | 768 |
| Training | Fine-tuned on [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) + SLERP merged (t=0.4) |
| Pooling | Mean pooling |
## Intended Use
**Primary use case:** First-stage retrieval for systematic review screening pipelines.
Given eligibility criteria (e.g., "RCTs of aspirin in adults with diabetes, published after 2015"), retrieve candidate documents from a corpus for human review or downstream reranking.
**Recommended pipeline:**
1. Encode eligibility criteria with this bi-encoder
2. Retrieve top-k candidates via similarity search
3. Rerank with [siren-screening-crossencoder](https://huggingface.co/Praise2112/siren-screening-crossencoder) for 3-class classification
## Usage
### Sentence-Transformers
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Praise2112/siren-screening-biencoder")
# Encode eligibility criteria (query)
query = "Randomized controlled trials of aspirin for cardiovascular prevention in diabetic adults"
query_embedding = model.encode(query)
# Encode candidate documents
documents = [
"A randomized trial of low-dose aspirin in 5,000 diabetic patients showed reduced MI risk...",
"This cohort study examined statin use in elderly populations with hyperlipidemia...",
]
doc_embeddings = model.encode(documents)
# Compute similarity
similarities = model.similarity(query_embedding, doc_embeddings)
print(similarities) # tensor([[0.85, 0.42]])
```
### Transformers
```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Praise2112/siren-screening-biencoder")
model = AutoModel.from_pretrained("Praise2112/siren-screening-biencoder")
def encode(texts):
inputs = tokenizer(texts, padding=True, truncation=True, max_length=768, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Mean pooling
attention_mask = inputs["attention_mask"]
embeddings = outputs.last_hidden_state
mask_expanded = attention_mask.unsqueeze(-1).expand(embeddings.size()).float()
sum_embeddings = torch.sum(embeddings * mask_expanded, dim=1)
sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
return F.normalize(sum_embeddings / sum_mask, p=2, dim=1)
query_emb = encode(["RCTs of aspirin in diabetic patients"])
doc_emb = encode(["A randomized trial of aspirin in 5,000 diabetic patients..."])
similarity = torch.mm(query_emb, doc_emb.T)
```
## Performance
### Internal Benchmark (SIREN Test Set)
| Model | MRR@10 | 95% CI | R@10 | NDCG@10 |
|-------|--------|--------|------|---------|
| [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | 0.697 | (0.69-0.71) | 0.889 | 0.744 |
| [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | 0.781 | (0.77-0.79) | 0.945 | 0.821 |
| [BGE-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 0.866 | (0.86-0.87) | 0.976 | 0.894 |
| [GTE-ModernBERT-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) | 0.861 | (0.86-0.87) | 0.974 | 0.889 |
| **SIREN (this model)** | **0.937** | (0.93-0.94) | **0.996** | **0.952** |
| SIREN + Cross-encoder | **0.952** | (0.95-0.96) | 0.997 | 0.963 |
This is +24 percentage points MRR@10 over [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder), the leading biomedical retrieval model.
### External Benchmarks
| Benchmark | Metric | SIREN | [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | [GTE-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) |
|-----------|--------|-------|--------|----------|
| CLEF-TAR 2019 | NDCG@10 | **0.434** | 0.314 | 0.407 |
| CLEF-TAR 2019 | WSS@95 | 0.931 | 0.931 | 0.931 |
| SciFact (BEIR) | NDCG@10 | **0.770** | 0.708 | 0.753 |
| TRECCOVID (BEIR) | NDCG@10 | **0.674** | 0.567 | 0.655 |
| NFCorpus (BEIR) | NDCG@10 | **0.351** | 0.322 | 0.349 |
## Training
This model was created by:
1. Fine-tuning `gte-modernbert-base` on the [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) dataset
2. SLERP merging with the base model (t=0.4) to preserve out-of-distribution generalization
**Training details:**
- Loss: Multiple Negatives Ranking Loss (MNRL) with in-batch negatives
- Batch size: 512 (via GradCache)
- Hard negatives: BM25-mined + LLM-generated
- Matryoshka dimensions: [768, 512, 256, 128, 64]
- Epochs: 1
## Limitations
- **Synthetic queries, real documents**: The queries and relevance labels are LLM-generated, but the documents are real PubMed articles
- **English only**: Trained on English PubMed articles
- **Not a classifier**: This model retrieves candidates; use the cross-encoder for relevance classification
## Citation
```bibtex
@misc{oketola2026siren,
title={SIREN: Improving Systematic Review Screening with Synthetic Training Data for Neural Retrievers},
author={Praise Oketola},
year={2026},
howpublished={\url{https://huggingface.co/Praise2112/siren-screening-biencoder}},
note={Bi-encoder model}
}
```
## License
Apache 2.0
|