RadBERT German CTRate Classifier
A RadBERT-based multi-label classifier for predicting 18 pathology labels from German-language radiology reports.
The training data consists of German-translated reports from the CTRate dataset, translated using Qwen 3.5 9B.
Model Details
| Property | Value |
|---|---|
| Base model | RadBERT (RoBERTa-base architecture, pre-trained on radiology text) |
| Task | Multi-label text classification (18 labels) |
| Language | German (de) |
| Framework | 🤗 Transformers + PyTorch |
| Problem type | multi_label_classification |
Labels (18 pathologies)
| ID | Label |
|---|---|
| 0 | Medical material |
| 1 | Arterial wall calcification |
| 2 | Cardiomegaly |
| 3 | Pericardial effusion |
| 4 | Coronary artery wall calcification |
| 5 | Hiatal hernia |
| 6 | Lymphadenopathy |
| 7 | Emphysema |
| 8 | Atelectasis |
| 9 | Lung nodule |
| 10 | Lung opacity |
| 11 | Pulmonary fibrotic sequela |
| 12 | Pleural effusion |
| 13 | Mosaic attenuation pattern |
| 14 | Peribronchial thickening |
| 15 | Consolidation |
| 16 | Bronchiectasis |
| 17 | Interlobular septal thickening |
Quick Start
Installation
pip install transformers torch
Loading the model
from transformers import AutoTokenizer, AutoConfig
from modeling_radbert import RadBertForSequenceClassification
import torch
repo_id = "suitch/radbert-german-ctrate-classifier"
# Download the custom model class (or copy modeling_radbert.py locally)
from huggingface_hub import hf_hub_download
import sys, os
modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py")
sys.path.insert(0, os.path.dirname(modeling_path))
# Load config, model, and tokenizer
config = AutoConfig.from_pretrained(repo_id)
model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model.eval()
Inference example
text = "Das Herz ist leicht vergrößert. Es zeigt sich ein kleiner Pleuraerguss links."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
logits = model(**inputs)
probabilities = torch.sigmoid(logits).squeeze()
threshold = 0.5
predicted_labels = [
config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold
]
print("Predicted labels:", predicted_labels)
print("Probabilities:")
for i, p in enumerate(probabilities):
print(f" {config.id2label[i]}: {p:.4f}")
Training Details
- Base checkpoint: RadBERT (RoBERTa-base weights pre-trained on radiology corpora)
- Training data: German translations of CTRate radiology reports (translated with Qwen 2.5 9B)
- Classification head: Linear layer on top of the
[CLS]/ pooler output - Loss: Binary Cross-Entropy with Logits (per-label sigmoid)
Limitations
- This model is trained for label inference from report text only — it does not process images.
- It should not be treated as a clinical decision support system.
- Performance is limited by the quality of the machine-translated training data.
Citation
If you use this model, please cite the CTRate dataset and RadBERT:
@article{hamamci2024ctrate,
title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology},
author={Hamamci, Ibrahim Ethem and others},
journal={arXiv preprint},
year={2024}
}
@article{yan2022radbert,
title={RadBERT: Adapting Transformer-based Language Models to Radiology},
author={Yan, Di and others},
journal={Radiology: Artificial Intelligence},
year={2022}
}
License
MIT
- Downloads last month
- 62
Model tree for suitch/radbert-german-ctrate-classifier
Base model
zzxslp/RadBERT-RoBERTa-4m