RadBERT English CTRate Classifier

A RadBERT-based multi-label classifier for predicting 18 pathology labels from English-language radiology reports.
The training data consists of English radiology reports from the CTRate dataset.

Model Details

Property	Value
Base model	RadBERT (RoBERTa-base architecture, pre-trained on radiology text)
Task	Multi-label text classification (18 labels)
Language	English (`en`)
Framework	🤗 Transformers + PyTorch
Problem type	`multi_label_classification`

Labels (18 pathologies)

ID	Label
0	Medical material
1	Arterial wall calcification
2	Cardiomegaly
3	Pericardial effusion
4	Coronary artery wall calcification
5	Hiatal hernia
6	Lymphadenopathy
7	Emphysema
8	Atelectasis
9	Lung nodule
10	Lung opacity
11	Pulmonary fibrotic sequela
12	Pleural effusion
13	Mosaic attenuation pattern
14	Peribronchial thickening
15	Consolidation
16	Bronchiectasis
17	Interlobular septal thickening

Quick Start

Installation

pip install transformers torch

Loading the model

from transformers import AutoTokenizer, AutoConfig
from modeling_radbert import RadBertForSequenceClassification
import torch

repo_id = "suitch/radbert-english-ctrate-classifier"

# Download the custom model class (or copy modeling_radbert.py locally)
from huggingface_hub import hf_hub_download
import sys, os

modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py")
sys.path.insert(0, os.path.dirname(modeling_path))

# Load config, model, and tokenizer
config = AutoConfig.from_pretrained(repo_id)
model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

model.eval()

Inference example

text = "The heart is mildly enlarged. A small left pleural effusion is noted."

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs)

probabilities = torch.sigmoid(logits).squeeze()
threshold = 0.5
predicted_labels = [
    config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold
]

print("Predicted labels:", predicted_labels)
print("Probabilities:")
for i, p in enumerate(probabilities):
    print(f"  {config.id2label[i]}: {p:.4f}")

Training Details

Base checkpoint: RadBERT (RoBERTa-base weights pre-trained on radiology corpora)
Training data: English radiology reports from the CTRate dataset
Classification head: Linear layer on top of the [CLS] / pooler output
Loss: Binary Cross-Entropy with Logits (per-label sigmoid)

Limitations

This model is trained for label inference from report text only — it does not process images.
It should not be treated as a clinical decision support system.

Citation

If you use this model, please cite the CTRate dataset and RadBERT:

@article{hamamci2024ctrate,
  title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology},
  author={Hamamci, Ibrahim Ethem and others},
  journal={arXiv preprint},
  year={2024}
}

@article{yan2022radbert,
  title={RadBERT: Adapting Transformer-based Language Models to Radiology},
  author={Yan, Di and others},
  journal={Radiology: Artificial Intelligence},
  year={2022}
}

License

MIT

Downloads last month: 46

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for suitch/radbert-english-ctrate-classifier

Base model

zzxslp/RadBERT-RoBERTa-4m

Finetuned

(3)

this model