Praise2112's picture
Upload biencoder model
7538324 verified
metadata
language:
  - en
license: apache-2.0
library_name: sentence-transformers
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - transformers
  - modernbert
  - biomedical
  - systematic-review
  - relevance-screening
  - information-retrieval
  - pubmed
datasets:
  - Praise2112/siren-screening
base_model:
  - Alibaba-NLP/gte-modernbert-base
pipeline_tag: sentence-similarity

SIREN Screening Bi-encoder

Dataset Cross-encoder License

A bi-encoder model for systematic review screening, trained to retrieve documents matching eligibility criteria. Achieves +24 percentage points MRR@10 over MedCPT on criteria-based retrieval.

Model Details

Property Value
Base Model Alibaba-NLP/gte-modernbert-base
Architecture ModernBERT (22 layers, 768 hidden, 12 heads)
Parameters ~149M
Max Sequence Length 8192 tokens
Embedding Dimension 768
Training Fine-tuned on siren-screening + SLERP merged (t=0.4)
Pooling Mean pooling

Intended Use

Primary use case: First-stage retrieval for systematic review screening pipelines.

Given eligibility criteria (e.g., "RCTs of aspirin in adults with diabetes, published after 2015"), retrieve candidate documents from a corpus for human review or downstream reranking.

Recommended pipeline:

  1. Encode eligibility criteria with this bi-encoder
  2. Retrieve top-k candidates via similarity search
  3. Rerank with siren-screening-crossencoder for 3-class classification

Usage

Sentence-Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Praise2112/siren-screening-biencoder")

# Encode eligibility criteria (query)
query = "Randomized controlled trials of aspirin for cardiovascular prevention in diabetic adults"
query_embedding = model.encode(query)

# Encode candidate documents
documents = [
    "A randomized trial of low-dose aspirin in 5,000 diabetic patients showed reduced MI risk...",
    "This cohort study examined statin use in elderly populations with hyperlipidemia...",
]
doc_embeddings = model.encode(documents)

# Compute similarity
similarities = model.similarity(query_embedding, doc_embeddings)
print(similarities)  # tensor([[0.85, 0.42]])

Transformers

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Praise2112/siren-screening-biencoder")
model = AutoModel.from_pretrained("Praise2112/siren-screening-biencoder")

def encode(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, max_length=768, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    # Mean pooling
    attention_mask = inputs["attention_mask"]
    embeddings = outputs.last_hidden_state
    mask_expanded = attention_mask.unsqueeze(-1).expand(embeddings.size()).float()
    sum_embeddings = torch.sum(embeddings * mask_expanded, dim=1)
    sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
    return F.normalize(sum_embeddings / sum_mask, p=2, dim=1)

query_emb = encode(["RCTs of aspirin in diabetic patients"])
doc_emb = encode(["A randomized trial of aspirin in 5,000 diabetic patients..."])
similarity = torch.mm(query_emb, doc_emb.T)

Performance

Internal Benchmark (SIREN Test Set)

Model MRR@10 95% CI R@10 NDCG@10
MedCPT 0.697 (0.69-0.71) 0.889 0.744
PubMedBERT 0.781 (0.77-0.79) 0.945 0.821
BGE-base-en-v1.5 0.866 (0.86-0.87) 0.976 0.894
GTE-ModernBERT-base 0.861 (0.86-0.87) 0.974 0.889
SIREN (this model) 0.937 (0.93-0.94) 0.996 0.952
SIREN + Cross-encoder 0.952 (0.95-0.96) 0.997 0.963

This is +24 percentage points MRR@10 over MedCPT, the leading biomedical retrieval model.

External Benchmarks

Benchmark Metric SIREN MedCPT GTE-base
CLEF-TAR 2019 NDCG@10 0.434 0.314 0.407
CLEF-TAR 2019 WSS@95 0.931 0.931 0.931
SciFact (BEIR) NDCG@10 0.770 0.708 0.753
TRECCOVID (BEIR) NDCG@10 0.674 0.567 0.655
NFCorpus (BEIR) NDCG@10 0.351 0.322 0.349

Training

This model was created by:

  1. Fine-tuning gte-modernbert-base on the siren-screening dataset
  2. SLERP merging with the base model (t=0.4) to preserve out-of-distribution generalization

Training details:

  • Loss: Multiple Negatives Ranking Loss (MNRL) with in-batch negatives
  • Batch size: 512 (via GradCache)
  • Hard negatives: BM25-mined + LLM-generated
  • Matryoshka dimensions: [768, 512, 256, 128, 64]
  • Epochs: 1

Limitations

  • Synthetic queries, real documents: The queries and relevance labels are LLM-generated, but the documents are real PubMed articles
  • English only: Trained on English PubMed articles
  • Not a classifier: This model retrieves candidates; use the cross-encoder for relevance classification

Citation

@misc{oketola2026siren,
  title={SIREN: Improving Systematic Review Screening with Synthetic Training Data for Neural Retrievers},
  author={Praise Oketola},
  year={2026},
  howpublished={\url{https://huggingface.co/Praise2112/siren-screening-biencoder}},
  note={Bi-encoder model}
}

License

Apache 2.0