X-Raydar NLP β Radiology Report Classifier
Pre-trained model weights for the NLP component of X-Raydar, from "Development and validation of open-source deep neural networks for comprehensive chest x-ray reading" (Cid, Macpherson et al., The Lancet Digital Health, 2024).
Website: x-raydar.info Code: github.com/gmontana/xraydar-nlp CV model: dnamodel/xraydar-cv
Model Description
RoBERTaX is a fine-tuned RoBERTa model that classifies free-text radiology reports into 45 finding categories using multi-label classification. It was trained on radiology reports from NHS hospitals.
Architecture
- Base model: RoBERTa-base (125M parameters)
- Classification head: Linear(768, 768) β Tanh β Linear(768, 45)
- Output: 45 sigmoid probabilities (multi-label)
- Input: Free-text radiology report (max 512 tokens)
Files
| File | Description |
|---|---|
nlp/robertax1.0.pt |
Fine-tuned RoBERTaX state dict (45-label classifier) |
nlp/pretrained/pytorch_model.bin |
Pretrained RoBERTa base weights |
nlp/pretrained/config.json |
Model configuration |
nlp/pretrained/vocab.json |
Tokenizer vocabulary |
nlp/pretrained/merges.txt |
BPE merges file |
Usage
Download weights
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="dnamodel/xraydar-nlp",
local_dir="./xraydar-nlp-weights"
)
Place weights for the code repository
from huggingface_hub import hf_hub_download
import shutil, os
os.makedirs("src/model/robertax_pretrained", exist_ok=True)
shutil.copy(
hf_hub_download("dnamodel/xraydar-nlp", "nlp/robertax1.0.pt"),
"src/model/robertax1.0.pt"
)
for f in ["pytorch_model.bin", "config.json", "vocab.json", "merges.txt"]:
shutil.copy(
hf_hub_download("dnamodel/xraydar-nlp", f"nlp/pretrained/{f}"),
f"src/model/robertax_pretrained/{f}"
)
See the code repository for full inference instructions.
NLP Labels (45 categories)
| # | Label | # | Label |
|---|---|---|---|
| 0 | abnormal_non_clinically_important | 23 | normal |
| 1 | aortic_calcification | 24 | object |
| 2 | apical_fibrosis | 25 | other |
| 3 | atelectasis | 26 | paraspinal_mass |
| 4 | axillary_abnormality | 27 | paratracheal_hilar_enlargement |
| 5 | bronchial_wall_thickening | 28 | parenchymal_lesion |
| 6 | bulla | 29 | pleural_abnormality |
| 7 | cardiomegaly | 30 | pleural_effusion |
| 8 | cavitating_lung_lesion | 31 | pneumomediastinum |
| 9 | clavicle_fracture | 32 | pneumoperitoneum |
| 10 | comparison | 33 | pneumothorax |
| 11 | consolidation | 34 | possible_diagnosis |
| 12 | coronary_calcification | 35 | recommendation |
| 13 | dextrocardia | 36 | rib_fracture |
| 14 | dilated_bowel | 37 | rib_lesion |
| 15 | emphysema | 38 | scoliosis |
| 16 | ground_glass_opacification | 39 | subcutaneous_emphysema |
| 17 | hemidiaphragm_elevated | 40 | technical_issue |
| 18 | hernia | 41 | undefined_sentence |
| 19 | hyperexpanded_lungs | 42 | unfolded_aorta |
| 20 | interstitial_shadowing | 43 | upper_lobe_blood_diversion |
| 21 | mediastinum_displaced | 44 | volume_loss |
| 22 | mediastinum_widened |
Citation
@article{cid2024development,
title={Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study},
author={Cid, Yan Digilov and Macpherson, Matt and others},
journal={The Lancet Digital Health},
volume={6},
number={1},
pages={e44--e57},
year={2024},
publisher={Elsevier},
doi={10.1016/S2589-7500(23)00218-2}
}
License
For academic research and non-commercial evaluation only. See x-raydar.info for terms and conditions.
Contact
- Questions or collaborations: Giovanni Montana β g.montana@warwick.ac.uk
- Commercial licensing: Warwick Ventures β ventures@warwick.ac.uk