suitch's picture
Update README.md
bd91392 verified
---
language:
- de
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- radiology
- medical-imaging
- chest-ct
- multi-label-classification
- radbert
- german
- ctrate
base_model: zzxslp/RadBERT-RoBERTa-4m
---
# RadBERT German CTRate Classifier
A **RadBERT**-based multi-label classifier for predicting 18 pathology labels from **German-language** radiology reports.
The training data consists of German-translated reports from the [CTRate](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) dataset, translated using Qwen 3.5 9B.
## Model Details
| Property | Value |
|---|---|
| **Base model** | RadBERT (RoBERTa-base architecture, pre-trained on radiology text) |
| **Task** | Multi-label text classification (18 labels) |
| **Language** | German (`de`) |
| **Framework** | 🤗 Transformers + PyTorch |
| **Problem type** | `multi_label_classification` |
## Labels (18 pathologies)
| ID | Label |
|----|-------|
| 0 | Medical material |
| 1 | Arterial wall calcification |
| 2 | Cardiomegaly |
| 3 | Pericardial effusion |
| 4 | Coronary artery wall calcification |
| 5 | Hiatal hernia |
| 6 | Lymphadenopathy |
| 7 | Emphysema |
| 8 | Atelectasis |
| 9 | Lung nodule |
| 10 | Lung opacity |
| 11 | Pulmonary fibrotic sequela |
| 12 | Pleural effusion |
| 13 | Mosaic attenuation pattern |
| 14 | Peribronchial thickening |
| 15 | Consolidation |
| 16 | Bronchiectasis |
| 17 | Interlobular septal thickening |
## Quick Start
### Installation
```bash
pip install transformers torch
```
### Loading the model
```python
from transformers import AutoTokenizer, AutoConfig
from modeling_radbert import RadBertForSequenceClassification
import torch
repo_id = "suitch/radbert-german-ctrate-classifier"
# Download the custom model class (or copy modeling_radbert.py locally)
from huggingface_hub import hf_hub_download
import sys, os
modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py")
sys.path.insert(0, os.path.dirname(modeling_path))
# Load config, model, and tokenizer
config = AutoConfig.from_pretrained(repo_id)
model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model.eval()
```
### Inference example
```python
text = "Das Herz ist leicht vergrößert. Es zeigt sich ein kleiner Pleuraerguss links."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
logits = model(**inputs)
probabilities = torch.sigmoid(logits).squeeze()
threshold = 0.5
predicted_labels = [
config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold
]
print("Predicted labels:", predicted_labels)
print("Probabilities:")
for i, p in enumerate(probabilities):
print(f" {config.id2label[i]}: {p:.4f}")
```
## Training Details
- **Base checkpoint**: RadBERT (RoBERTa-base weights pre-trained on radiology corpora)
- **Training data**: German translations of CTRate radiology reports (translated with Qwen 2.5 9B)
- **Classification head**: Linear layer on top of the `[CLS]` / pooler output
- **Loss**: Binary Cross-Entropy with Logits (per-label sigmoid)
## Limitations
- This model is trained for **label inference from report text only** — it does **not** process images.
- It should **not** be treated as a clinical decision support system.
- Performance is limited by the quality of the machine-translated training data.
## Citation
If you use this model, please cite the CTRate dataset and RadBERT:
```bibtex
@article{hamamci2024ctrate,
title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology},
author={Hamamci, Ibrahim Ethem and others},
journal={arXiv preprint},
year={2024}
}
@article{yan2022radbert,
title={RadBERT: Adapting Transformer-based Language Models to Radiology},
author={Yan, Di and others},
journal={Radiology: Artificial Intelligence},
year={2022}
}
```
## License
MIT