RadBERT English CTRate Classifier

A RadBERT-based multi-label classifier for predicting 18 pathology labels from English-language radiology reports.
The training data consists of English radiology reports from the CTRate dataset.

Model Details

Property Value
Base model RadBERT (RoBERTa-base architecture, pre-trained on radiology text)
Task Multi-label text classification (18 labels)
Language English (en)
Framework ๐Ÿค— Transformers + PyTorch
Problem type multi_label_classification

Labels (18 pathologies)

ID Label
0 Medical material
1 Arterial wall calcification
2 Cardiomegaly
3 Pericardial effusion
4 Coronary artery wall calcification
5 Hiatal hernia
6 Lymphadenopathy
7 Emphysema
8 Atelectasis
9 Lung nodule
10 Lung opacity
11 Pulmonary fibrotic sequela
12 Pleural effusion
13 Mosaic attenuation pattern
14 Peribronchial thickening
15 Consolidation
16 Bronchiectasis
17 Interlobular septal thickening

Quick Start

Installation

pip install transformers torch

Loading the model

from transformers import AutoTokenizer, AutoConfig
from modeling_radbert import RadBertForSequenceClassification
import torch

repo_id = "suitch/radbert-english-ctrate-classifier"

# Download the custom model class (or copy modeling_radbert.py locally)
from huggingface_hub import hf_hub_download
import sys, os

modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py")
sys.path.insert(0, os.path.dirname(modeling_path))

# Load config, model, and tokenizer
config = AutoConfig.from_pretrained(repo_id)
model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

model.eval()

Inference example

text = "The heart is mildly enlarged. A small left pleural effusion is noted."

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs)

probabilities = torch.sigmoid(logits).squeeze()
threshold = 0.5
predicted_labels = [
    config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold
]

print("Predicted labels:", predicted_labels)
print("Probabilities:")
for i, p in enumerate(probabilities):
    print(f"  {config.id2label[i]}: {p:.4f}")

Training Details

  • Base checkpoint: RadBERT (RoBERTa-base weights pre-trained on radiology corpora)
  • Training data: English radiology reports from the CTRate dataset
  • Classification head: Linear layer on top of the [CLS] / pooler output
  • Loss: Binary Cross-Entropy with Logits (per-label sigmoid)

Limitations

  • This model is trained for label inference from report text only โ€” it does not process images.
  • It should not be treated as a clinical decision support system.

Citation

If you use this model, please cite the CTRate dataset and RadBERT:

@article{hamamci2024ctrate,
  title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology},
  author={Hamamci, Ibrahim Ethem and others},
  journal={arXiv preprint},
  year={2024}
}

@article{yan2022radbert,
  title={RadBERT: Adapting Transformer-based Language Models to Radiology},
  author={Yan, Di and others},
  journal={Radiology: Artificial Intelligence},
  year={2022}
}

License

MIT

Downloads last month
46
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for suitch/radbert-english-ctrate-classifier

Finetuned
(3)
this model