RadBERT German CTRate Classifier

A RadBERT-based multi-label classifier for predicting 18 pathology labels from German-language radiology reports.
The training data consists of German-translated reports from the CTRate dataset, translated using Qwen 3.5 9B.

Model Details

Property Value
Base model RadBERT (RoBERTa-base architecture, pre-trained on radiology text)
Task Multi-label text classification (18 labels)
Language German (de)
Framework 🤗 Transformers + PyTorch
Problem type multi_label_classification

Labels (18 pathologies)

ID Label
0 Medical material
1 Arterial wall calcification
2 Cardiomegaly
3 Pericardial effusion
4 Coronary artery wall calcification
5 Hiatal hernia
6 Lymphadenopathy
7 Emphysema
8 Atelectasis
9 Lung nodule
10 Lung opacity
11 Pulmonary fibrotic sequela
12 Pleural effusion
13 Mosaic attenuation pattern
14 Peribronchial thickening
15 Consolidation
16 Bronchiectasis
17 Interlobular septal thickening

Quick Start

Installation

pip install transformers torch

Loading the model

from transformers import AutoTokenizer, AutoConfig
from modeling_radbert import RadBertForSequenceClassification
import torch

repo_id = "suitch/radbert-german-ctrate-classifier"

# Download the custom model class (or copy modeling_radbert.py locally)
from huggingface_hub import hf_hub_download
import sys, os

modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py")
sys.path.insert(0, os.path.dirname(modeling_path))

# Load config, model, and tokenizer
config = AutoConfig.from_pretrained(repo_id)
model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

model.eval()

Inference example

text = "Das Herz ist leicht vergrößert. Es zeigt sich ein kleiner Pleuraerguss links."

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs)

probabilities = torch.sigmoid(logits).squeeze()
threshold = 0.5
predicted_labels = [
    config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold
]

print("Predicted labels:", predicted_labels)
print("Probabilities:")
for i, p in enumerate(probabilities):
    print(f"  {config.id2label[i]}: {p:.4f}")

Training Details

  • Base checkpoint: RadBERT (RoBERTa-base weights pre-trained on radiology corpora)
  • Training data: German translations of CTRate radiology reports (translated with Qwen 2.5 9B)
  • Classification head: Linear layer on top of the [CLS] / pooler output
  • Loss: Binary Cross-Entropy with Logits (per-label sigmoid)

Limitations

  • This model is trained for label inference from report text only — it does not process images.
  • It should not be treated as a clinical decision support system.
  • Performance is limited by the quality of the machine-translated training data.

Citation

If you use this model, please cite the CTRate dataset and RadBERT:

@article{hamamci2024ctrate,
  title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology},
  author={Hamamci, Ibrahim Ethem and others},
  journal={arXiv preprint},
  year={2024}
}

@article{yan2022radbert,
  title={RadBERT: Adapting Transformer-based Language Models to Radiology},
  author={Yan, Di and others},
  journal={Radiology: Artificial Intelligence},
  year={2022}
}

License

MIT

Downloads last month
62
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for suitch/radbert-german-ctrate-classifier

Finetuned
(3)
this model