| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: sentence-transformers |
| | tags: |
| | - sentence-transformers |
| | - feature-extraction |
| | - sentence-similarity |
| | - transformers |
| | - modernbert |
| | - biomedical |
| | - systematic-review |
| | - relevance-screening |
| | - information-retrieval |
| | - pubmed |
| | datasets: |
| | - Praise2112/siren-screening |
| | base_model: |
| | - Alibaba-NLP/gte-modernbert-base |
| | pipeline_tag: sentence-similarity |
| | --- |
| | |
| | # SIREN Screening Bi-encoder |
| |
|
| | <p align="center"> |
| | <a href="https://huggingface.co/datasets/Praise2112/siren-screening"> |
| | <img src="https://img.shields.io/badge/Dataset-siren--screening-yellow.svg" alt="Dataset"/> |
| | </a> |
| | <a href="https://huggingface.co/Praise2112/siren-screening-crossencoder"> |
| | <img src="https://img.shields.io/badge/Reranker-siren--screening--crossencoder-blue.svg" alt="Cross-encoder"/> |
| | </a> |
| | <img src="https://img.shields.io/badge/License-Apache_2.0-green.svg" alt="License"/> |
| | </p> |
| | |
| | A bi-encoder model for **systematic review screening**, trained to retrieve documents matching eligibility criteria. Achieves **+24 percentage points MRR@10** over MedCPT on criteria-based retrieval. |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | Base Model | [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) | |
| | | Architecture | ModernBERT (22 layers, 768 hidden, 12 heads) | |
| | | Parameters | ~149M | |
| | | Max Sequence Length | 8192 tokens | |
| | | Embedding Dimension | 768 | |
| | | Training | Fine-tuned on [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) + SLERP merged (t=0.4) | |
| | | Pooling | Mean pooling | |
| |
|
| | ## Intended Use |
| |
|
| | **Primary use case:** First-stage retrieval for systematic review screening pipelines. |
| |
|
| | Given eligibility criteria (e.g., "RCTs of aspirin in adults with diabetes, published after 2015"), retrieve candidate documents from a corpus for human review or downstream reranking. |
| |
|
| | **Recommended pipeline:** |
| | 1. Encode eligibility criteria with this bi-encoder |
| | 2. Retrieve top-k candidates via similarity search |
| | 3. Rerank with [siren-screening-crossencoder](https://huggingface.co/Praise2112/siren-screening-crossencoder) for 3-class classification |
| |
|
| | ## Usage |
| |
|
| | ### Sentence-Transformers |
| |
|
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | |
| | model = SentenceTransformer("Praise2112/siren-screening-biencoder") |
| | |
| | # Encode eligibility criteria (query) |
| | query = "Randomized controlled trials of aspirin for cardiovascular prevention in diabetic adults" |
| | query_embedding = model.encode(query) |
| | |
| | # Encode candidate documents |
| | documents = [ |
| | "A randomized trial of low-dose aspirin in 5,000 diabetic patients showed reduced MI risk...", |
| | "This cohort study examined statin use in elderly populations with hyperlipidemia...", |
| | ] |
| | doc_embeddings = model.encode(documents) |
| | |
| | # Compute similarity |
| | similarities = model.similarity(query_embedding, doc_embeddings) |
| | print(similarities) # tensor([[0.85, 0.42]]) |
| | ``` |
| |
|
| | ### Transformers |
| |
|
| | ```python |
| | import torch |
| | import torch.nn.functional as F |
| | from transformers import AutoTokenizer, AutoModel |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("Praise2112/siren-screening-biencoder") |
| | model = AutoModel.from_pretrained("Praise2112/siren-screening-biencoder") |
| | |
| | def encode(texts): |
| | inputs = tokenizer(texts, padding=True, truncation=True, max_length=768, return_tensors="pt") |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | # Mean pooling |
| | attention_mask = inputs["attention_mask"] |
| | embeddings = outputs.last_hidden_state |
| | mask_expanded = attention_mask.unsqueeze(-1).expand(embeddings.size()).float() |
| | sum_embeddings = torch.sum(embeddings * mask_expanded, dim=1) |
| | sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9) |
| | return F.normalize(sum_embeddings / sum_mask, p=2, dim=1) |
| | |
| | query_emb = encode(["RCTs of aspirin in diabetic patients"]) |
| | doc_emb = encode(["A randomized trial of aspirin in 5,000 diabetic patients..."]) |
| | similarity = torch.mm(query_emb, doc_emb.T) |
| | ``` |
| |
|
| | ## Performance |
| |
|
| | ### Internal Benchmark (SIREN Test Set) |
| |
|
| | | Model | MRR@10 | 95% CI | R@10 | NDCG@10 | |
| | |-------|--------|--------|------|---------| |
| | | [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | 0.697 | (0.69-0.71) | 0.889 | 0.744 | |
| | | [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | 0.781 | (0.77-0.79) | 0.945 | 0.821 | |
| | | [BGE-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 0.866 | (0.86-0.87) | 0.976 | 0.894 | |
| | | [GTE-ModernBERT-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) | 0.861 | (0.86-0.87) | 0.974 | 0.889 | |
| | | **SIREN (this model)** | **0.937** | (0.93-0.94) | **0.996** | **0.952** | |
| | | SIREN + Cross-encoder | **0.952** | (0.95-0.96) | 0.997 | 0.963 | |
| |
|
| | This is +24 percentage points MRR@10 over [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder), the leading biomedical retrieval model. |
| |
|
| | ### External Benchmarks |
| |
|
| | | Benchmark | Metric | SIREN | [MedCPT](https://huggingface.co/ncbi/MedCPT-Query-Encoder) | [GTE-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) | |
| | |-----------|--------|-------|--------|----------| |
| | | CLEF-TAR 2019 | NDCG@10 | **0.434** | 0.314 | 0.407 | |
| | | CLEF-TAR 2019 | WSS@95 | 0.931 | 0.931 | 0.931 | |
| | | SciFact (BEIR) | NDCG@10 | **0.770** | 0.708 | 0.753 | |
| | | TRECCOVID (BEIR) | NDCG@10 | **0.674** | 0.567 | 0.655 | |
| | | NFCorpus (BEIR) | NDCG@10 | **0.351** | 0.322 | 0.349 | |
| |
|
| | ## Training |
| |
|
| | This model was created by: |
| | 1. Fine-tuning `gte-modernbert-base` on the [siren-screening](https://huggingface.co/datasets/Praise2112/siren-screening) dataset |
| | 2. SLERP merging with the base model (t=0.4) to preserve out-of-distribution generalization |
| |
|
| | **Training details:** |
| | - Loss: Multiple Negatives Ranking Loss (MNRL) with in-batch negatives |
| | - Batch size: 512 (via GradCache) |
| | - Hard negatives: BM25-mined + LLM-generated |
| | - Matryoshka dimensions: [768, 512, 256, 128, 64] |
| | - Epochs: 1 |
| |
|
| | ## Limitations |
| |
|
| | - **Synthetic queries, real documents**: The queries and relevance labels are LLM-generated, but the documents are real PubMed articles |
| | - **English only**: Trained on English PubMed articles |
| | - **Not a classifier**: This model retrieves candidates; use the cross-encoder for relevance classification |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{oketola2026siren, |
| | title={SIREN: Improving Systematic Review Screening with Synthetic Training Data for Neural Retrievers}, |
| | author={Praise Oketola}, |
| | year={2026}, |
| | howpublished={\url{https://huggingface.co/Praise2112/siren-screening-biencoder}}, |
| | note={Bi-encoder model} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|