X-Raydar NLP β€” Radiology Report Classifier

Pre-trained model weights for the NLP component of X-Raydar, from "Development and validation of open-source deep neural networks for comprehensive chest x-ray reading" (Cid, Macpherson et al., The Lancet Digital Health, 2024).

Website: x-raydar.info Code: github.com/gmontana/xraydar-nlp CV model: dnamodel/xraydar-cv

Model Description

RoBERTaX is a fine-tuned RoBERTa model that classifies free-text radiology reports into 45 finding categories using multi-label classification. It was trained on radiology reports from NHS hospitals.

Architecture

  • Base model: RoBERTa-base (125M parameters)
  • Classification head: Linear(768, 768) β†’ Tanh β†’ Linear(768, 45)
  • Output: 45 sigmoid probabilities (multi-label)
  • Input: Free-text radiology report (max 512 tokens)

Files

File Description
nlp/robertax1.0.pt Fine-tuned RoBERTaX state dict (45-label classifier)
nlp/pretrained/pytorch_model.bin Pretrained RoBERTa base weights
nlp/pretrained/config.json Model configuration
nlp/pretrained/vocab.json Tokenizer vocabulary
nlp/pretrained/merges.txt BPE merges file

Usage

Download weights

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="dnamodel/xraydar-nlp",
    local_dir="./xraydar-nlp-weights"
)

Place weights for the code repository

from huggingface_hub import hf_hub_download
import shutil, os

os.makedirs("src/model/robertax_pretrained", exist_ok=True)

shutil.copy(
    hf_hub_download("dnamodel/xraydar-nlp", "nlp/robertax1.0.pt"),
    "src/model/robertax1.0.pt"
)

for f in ["pytorch_model.bin", "config.json", "vocab.json", "merges.txt"]:
    shutil.copy(
        hf_hub_download("dnamodel/xraydar-nlp", f"nlp/pretrained/{f}"),
        f"src/model/robertax_pretrained/{f}"
    )

See the code repository for full inference instructions.

NLP Labels (45 categories)

# Label # Label
0 abnormal_non_clinically_important 23 normal
1 aortic_calcification 24 object
2 apical_fibrosis 25 other
3 atelectasis 26 paraspinal_mass
4 axillary_abnormality 27 paratracheal_hilar_enlargement
5 bronchial_wall_thickening 28 parenchymal_lesion
6 bulla 29 pleural_abnormality
7 cardiomegaly 30 pleural_effusion
8 cavitating_lung_lesion 31 pneumomediastinum
9 clavicle_fracture 32 pneumoperitoneum
10 comparison 33 pneumothorax
11 consolidation 34 possible_diagnosis
12 coronary_calcification 35 recommendation
13 dextrocardia 36 rib_fracture
14 dilated_bowel 37 rib_lesion
15 emphysema 38 scoliosis
16 ground_glass_opacification 39 subcutaneous_emphysema
17 hemidiaphragm_elevated 40 technical_issue
18 hernia 41 undefined_sentence
19 hyperexpanded_lungs 42 unfolded_aorta
20 interstitial_shadowing 43 upper_lobe_blood_diversion
21 mediastinum_displaced 44 volume_loss
22 mediastinum_widened

Citation

@article{cid2024development,
  title={Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study},
  author={Cid, Yan Digilov and Macpherson, Matt and others},
  journal={The Lancet Digital Health},
  volume={6},
  number={1},
  pages={e44--e57},
  year={2024},
  publisher={Elsevier},
  doi={10.1016/S2589-7500(23)00218-2}
}

License

For academic research and non-commercial evaluation only. See x-raydar.info for terms and conditions.

Contact

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support