X-Raydar NLP — Radiology Report Classifier

Pre-trained model weights for the NLP component of X-Raydar, from "Development and validation of open-source deep neural networks for comprehensive chest x-ray reading" (Cid, Macpherson et al., The Lancet Digital Health, 2024).

Website: x-raydar.info Code: github.com/gmontana/xraydar-nlp CV model: dnamodel/xraydar-cv

Model Description

RoBERTaX is a fine-tuned RoBERTa model that classifies free-text radiology reports into 45 finding categories using multi-label classification. It was trained on radiology reports from NHS hospitals.

Architecture

Base model: RoBERTa-base (125M parameters)
Classification head: Linear(768, 768) → Tanh → Linear(768, 45)
Output: 45 sigmoid probabilities (multi-label)
Input: Free-text radiology report (max 512 tokens)

Files

File	Description
`nlp/robertax1.0.pt`	Fine-tuned RoBERTaX state dict (45-label classifier)
`nlp/pretrained/pytorch_model.bin`	Pretrained RoBERTa base weights
`nlp/pretrained/config.json`	Model configuration
`nlp/pretrained/vocab.json`	Tokenizer vocabulary
`nlp/pretrained/merges.txt`	BPE merges file

Usage

Download weights

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="dnamodel/xraydar-nlp",
    local_dir="./xraydar-nlp-weights"
)

Place weights for the code repository

from huggingface_hub import hf_hub_download
import shutil, os

os.makedirs("src/model/robertax_pretrained", exist_ok=True)

shutil.copy(
    hf_hub_download("dnamodel/xraydar-nlp", "nlp/robertax1.0.pt"),
    "src/model/robertax1.0.pt"
)

for f in ["pytorch_model.bin", "config.json", "vocab.json", "merges.txt"]:
    shutil.copy(
        hf_hub_download("dnamodel/xraydar-nlp", f"nlp/pretrained/{f}"),
        f"src/model/robertax_pretrained/{f}"
    )

See the code repository for full inference instructions.

NLP Labels (45 categories)

#	Label	#	Label
0	abnormal_non_clinically_important	23	normal
1	aortic_calcification	24	object
2	apical_fibrosis	25	other
3	atelectasis	26	paraspinal_mass
4	axillary_abnormality	27	paratracheal_hilar_enlargement
5	bronchial_wall_thickening	28	parenchymal_lesion
6	bulla	29	pleural_abnormality
7	cardiomegaly	30	pleural_effusion
8	cavitating_lung_lesion	31	pneumomediastinum
9	clavicle_fracture	32	pneumoperitoneum
10	comparison	33	pneumothorax
11	consolidation	34	possible_diagnosis
12	coronary_calcification	35	recommendation
13	dextrocardia	36	rib_fracture
14	dilated_bowel	37	rib_lesion
15	emphysema	38	scoliosis
16	ground_glass_opacification	39	subcutaneous_emphysema
17	hemidiaphragm_elevated	40	technical_issue
18	hernia	41	undefined_sentence
19	hyperexpanded_lungs	42	unfolded_aorta
20	interstitial_shadowing	43	upper_lobe_blood_diversion
21	mediastinum_displaced	44	volume_loss
22	mediastinum_widened

Citation

@article{cid2024development,
  title={Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study},
  author={Cid, Yan Digilov and Macpherson, Matt and others},
  journal={The Lancet Digital Health},
  volume={6},
  number={1},
  pages={e44--e57},
  year={2024},
  publisher={Elsevier},
  doi={10.1016/S2589-7500(23)00218-2}
}

License

For academic research and non-commercial evaluation only. See x-raydar.info for terms and conditions.

Contact

Questions or collaborations: Giovanni Montana — g.montana@warwick.ac.uk
Commercial licensing: Warwick Ventures — ventures@warwick.ac.uk

Downloads last month: 3