File size: 6,499 Bytes
73aa061
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
language:
- en
license: apache-2.0
library_name: sentence-transformers
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- modernbert
- biomedical
- systematic-review
- relevance-screening
- information-retrieval
- pubmed
datasets:
- Praise2112/siren-screening
base_model:
- Alibaba-NLP/gte-modernbert-base
pipeline_tag: sentence-similarity
---

# SIREN Screening Bi-encoder

<p align="center">
    <a href="https://huggingface.co/datasets/Praise2112/siren-screening">
        <img src="https://img.shields.io/badge/Dataset-siren--screening-yellow.svg" alt="Dataset"/>
    </a>
    <a href="https://huggingface.co/Praise2112/siren-screening-crossencoder">
        <img src="https://img.shields.io/badge/Reranker-siren--screening--crossencoder-blue.svg" alt="Cross-encoder"/>
    </a>
    <img src="https://img.shields.io/badge/License-Apache_2.0-green.svg" alt="License"/>
</p>

A bi-encoder model for **systematic review screening**, trained to retrieve documents matching eligibility criteria. Achieves **+24 percentage points MRR@10** over MedCPT on criteria-based retrieval.

## Model Details

| Property | Value |
|----------|-------|
| Base Model | [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) |
| Architecture | ModernBERT (22 layers, 768 hidden, 12 heads) |
| Parameters | ~149M |
| Max Sequence Length | 8192 tokens |
| Embedding Dimension | 768 |
| Training | Fine-tuned on [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) + SLERP merged (t=0.4) |
| Pooling | Mean pooling |

## Intended Use

**Primary use case:** First-stage retrieval for systematic review screening pipelines.

Given eligibility criteria (e.g., "RCTs of aspirin in adults with diabetes, published after 2015"), retrieve candidate documents from a corpus for human review or downstream reranking.

**Recommended pipeline:**
1. Encode eligibility criteria with this bi-encoder
2. Retrieve top-k candidates via similarity search
3. Rerank with [siren-screening-crossencoder](https://huggingface.co/Praise2112/siren-screening-crossencoder) for 3-class classification

## Usage

### Sentence-Transformers

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Praise2112/siren-screening-biencoder")

# Encode eligibility criteria (query)
query = "Randomized controlled trials of aspirin for cardiovascular prevention in diabetic adults"
query_embedding = model.encode(query)

# Encode candidate documents
documents = [
    "A randomized trial of low-dose aspirin in 5,000 diabetic patients showed reduced MI risk...",
    "This cohort study examined statin use in elderly populations with hyperlipidemia...",
]
doc_embeddings = model.encode(documents)

# Compute similarity
similarities = model.similarity(query_embedding, doc_embeddings)
print(similarities)  # tensor([[0.85, 0.42]])
```

### Transformers

```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Praise2112/siren-screening-biencoder")
model = AutoModel.from_pretrained("Praise2112/siren-screening-biencoder")

def encode(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, max_length=768, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    # Mean pooling
    attention_mask = inputs["attention_mask"]
    embeddings = outputs.last_hidden_state
    mask_expanded = attention_mask.unsqueeze(-1).expand(embeddings.size()).float()
    sum_embeddings = torch.sum(embeddings * mask_expanded, dim=1)
    sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
    return F.normalize(sum_embeddings / sum_mask, p=2, dim=1)

query_emb = encode(["RCTs of aspirin in diabetic patients"])
doc_emb = encode(["A randomized trial of aspirin in 5,000 diabetic patients..."])
similarity = torch.mm(query_emb, doc_emb.T)
```

## Performance

### Internal Benchmark (SIREN Test Set)

| Model | MRR@10 | 95% CI | R@10 | NDCG@10 |
|-------|--------|--------|------|---------|
| [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | 0.697 | (0.69-0.71) | 0.889 | 0.744 |
| [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | 0.781 | (0.77-0.79) | 0.945 | 0.821 |
| [BGE-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 0.866 | (0.86-0.87) | 0.976 | 0.894 |
| [GTE-ModernBERT-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) | 0.861 | (0.86-0.87) | 0.974 | 0.889 |
| **SIREN (this model)** | **0.937** | (0.93-0.94) | **0.996** | **0.952** |
| SIREN + Cross-encoder | **0.952** | (0.95-0.96) | 0.997 | 0.963 |

This is +24 percentage points MRR@10 over [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder), the leading biomedical retrieval model.

### External Benchmarks

| Benchmark | Metric | SIREN | [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | [GTE-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) |
|-----------|--------|-------|--------|----------|
| CLEF-TAR 2019 | NDCG@10 | **0.434** | 0.314 | 0.407 |
| CLEF-TAR 2019 | WSS@95 | 0.931 | 0.931 | 0.931 |
| SciFact (BEIR) | NDCG@10 | **0.770** | 0.708 | 0.753 |
| TRECCOVID (BEIR) | NDCG@10 | **0.674** | 0.567 | 0.655 |
| NFCorpus (BEIR) | NDCG@10 | **0.351** | 0.322 | 0.349 |

## Training

This model was created by:
1. Fine-tuning `gte-modernbert-base` on the [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) dataset
2. SLERP merging with the base model (t=0.4) to preserve out-of-distribution generalization

**Training details:**
- Loss: Multiple Negatives Ranking Loss (MNRL) with in-batch negatives
- Batch size: 512 (via GradCache)
- Hard negatives: BM25-mined + LLM-generated
- Matryoshka dimensions: [768, 512, 256, 128, 64]
- Epochs: 1

## Limitations

- **Synthetic queries, real documents**: The queries and relevance labels are LLM-generated, but the documents are real PubMed articles
- **English only**: Trained on English PubMed articles
- **Not a classifier**: This model retrieves candidates; use the cross-encoder for relevance classification

## Citation

```bibtex
@misc{oketola2026siren,
  title={SIREN: Improving Systematic Review Screening with Synthetic Training Data for Neural Retrievers},
  author={Praise Oketola},
  year={2026},
  howpublished={\url{https://huggingface.co/Praise2112/siren-screening-biencoder}},
  note={Bi-encoder model}
}
```

## License

Apache 2.0