NeuroBERT

A sentence-transformers model optimized for neuroradiology reports. Maps sentences to 768-dimensional embeddings for semantic similarity tasks.

Overview

NeuroBERT is a RoBERTa-based model with a custom 10,000-word neuroradiology vocabulary trained from scratch. Standard BERT tokenization fragments medical terms (e.g., "hemorrhage" → "he", "morr", "hage"), so we trained a domain-specific WordPiece vocabulary to preserve neuroradiologic terminology.

Training:

Masked language modeling on neuroradiology reports (next sentence prediction omitted as adjacent sentences are often unrelated)
Radiology section matching using a SentenceBERT twin-network architecture to align Findings and Summary sections from the same report

Usage

pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('davvwood/NeuroBERT')

# Reference templates for normal findings
templates = [
    'normal study',
    'normal appearances of the brain',
    'no intracranial abnormality identified'
]
template_embeddings = model.encode(templates)

# Example reports
reports = [
    "mri head: there is restricted diffusion in the left paramedian ventral pons at the level of the middle cerebellar peduncle in keeping with an acute infarct.",
    "mri head: the ventricles and extra cerebral csf spaces are of normal size. no focal intracranial abnormality has been identified. conclusion: normal intracranial appearances"
]

for report in reports:
    report_embedding = model.encode(report)
    similarities = [util.cos_sim(t_emb, report_embedding).item() for t_emb in template_embeddings]
    print(f"Max similarity to normal templates: {max(similarities):.3f}")

Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with RobertaModel
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True})
)

Citation

If you use NeuroBERT, please cite:

@article{wood2025neurobert,
  title={Self-supervised Text-vision Alignment for Automated Brain MRI Abnormality Detection: A Multicenter Study (ALIGN Study)},
  author={Wood, D. A. and Guilhem, E. and Kafiabadi, S. and Al Busaidi, A. and Dissanayake, K. and Hammam, A. and others},
  journal={Radiology: Artificial Intelligence},
  pages={e240619},
  year={2025},
  doi={10.1148/ryai.240619}
}

Paper: https://doi.org/10.1148/ryai.240619

Downloads last month: 44