NeuroBERT
A sentence-transformers model optimized for neuroradiology reports. Maps sentences to 768-dimensional embeddings for semantic similarity tasks.
Overview
NeuroBERT is a RoBERTa-based model with a custom 10,000-word neuroradiology vocabulary trained from scratch. Standard BERT tokenization fragments medical terms (e.g., "hemorrhage" → "he", "morr", "hage"), so we trained a domain-specific WordPiece vocabulary to preserve neuroradiologic terminology.
Training:
- Masked language modeling on neuroradiology reports (next sentence prediction omitted as adjacent sentences are often unrelated)
- Radiology section matching using a SentenceBERT twin-network architecture to align Findings and Summary sections from the same report
Usage
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('davvwood/NeuroBERT')
# Reference templates for normal findings
templates = [
'normal study',
'normal appearances of the brain',
'no intracranial abnormality identified'
]
template_embeddings = model.encode(templates)
# Example reports
reports = [
"mri head: there is restricted diffusion in the left paramedian ventral pons at the level of the middle cerebellar peduncle in keeping with an acute infarct.",
"mri head: the ventricles and extra cerebral csf spaces are of normal size. no focal intracranial abnormality has been identified. conclusion: normal intracranial appearances"
]
for report in reports:
report_embedding = model.encode(report)
similarities = [util.cos_sim(t_emb, report_embedding).item() for t_emb in template_embeddings]
print(f"Max similarity to normal templates: {max(similarities):.3f}")
Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True})
)
Citation
If you use NeuroBERT, please cite:
@article{wood2025neurobert,
title={Self-supervised Text-vision Alignment for Automated Brain MRI Abnormality Detection: A Multicenter Study (ALIGN Study)},
author={Wood, D. A. and Guilhem, E. and Kafiabadi, S. and Al Busaidi, A. and Dissanayake, K. and Hammam, A. and others},
journal={Radiology: Artificial Intelligence},
pages={e240619},
year={2025},
doi={10.1148/ryai.240619}
}
- Downloads last month
- 5