---
language:
- de
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- radiology
- medical-imaging
- chest-ct
- multi-label-classification
- radbert
- german
- ctrate
base_model: zzxslp/RadBERT-RoBERTa-4m
---

# RadBERT German CTRate Classifier

A **RadBERT**-based multi-label classifier for predicting 18 pathology labels from **German-language** radiology reports.  
The training data consists of German-translated reports from the [CTRate](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) dataset, translated using Qwen 3.5 9B.

## Model Details

| Property | Value |
|---|---|
| **Base model** | RadBERT (RoBERTa-base architecture, pre-trained on radiology text) |
| **Task** | Multi-label text classification (18 labels) |
| **Language** | German (`de`) |
| **Framework** | 🤗 Transformers + PyTorch |
| **Problem type** | `multi_label_classification` |

## Labels (18 pathologies)

| ID | Label |
|----|-------|
| 0  | Medical material |
| 1  | Arterial wall calcification |
| 2  | Cardiomegaly |
| 3  | Pericardial effusion |
| 4  | Coronary artery wall calcification |
| 5  | Hiatal hernia |
| 6  | Lymphadenopathy |
| 7  | Emphysema |
| 8  | Atelectasis |
| 9  | Lung nodule |
| 10 | Lung opacity |
| 11 | Pulmonary fibrotic sequela |
| 12 | Pleural effusion |
| 13 | Mosaic attenuation pattern |
| 14 | Peribronchial thickening |
| 15 | Consolidation |
| 16 | Bronchiectasis |
| 17 | Interlobular septal thickening |

## Quick Start

### Installation

```bash
pip install transformers torch
```

### Loading the model

```python
from transformers import AutoTokenizer, AutoConfig
from modeling_radbert import RadBertForSequenceClassification
import torch

repo_id = "suitch/radbert-german-ctrate-classifier"

# Download the custom model class (or copy modeling_radbert.py locally)
from huggingface_hub import hf_hub_download
import sys, os

modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py")
sys.path.insert(0, os.path.dirname(modeling_path))

# Load config, model, and tokenizer
config = AutoConfig.from_pretrained(repo_id)
model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

model.eval()
```

### Inference example

```python
text = "Das Herz ist leicht vergrößert. Es zeigt sich ein kleiner Pleuraerguss links."

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs)

probabilities = torch.sigmoid(logits).squeeze()
threshold = 0.5
predicted_labels = [
    config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold
]

print("Predicted labels:", predicted_labels)
print("Probabilities:")
for i, p in enumerate(probabilities):
    print(f"  {config.id2label[i]}: {p:.4f}")
```

## Training Details

- **Base checkpoint**: RadBERT (RoBERTa-base weights pre-trained on radiology corpora)
- **Training data**: German translations of CTRate radiology reports (translated with Qwen 2.5 9B)
- **Classification head**: Linear layer on top of the `[CLS]` / pooler output
- **Loss**: Binary Cross-Entropy with Logits (per-label sigmoid)

## Limitations

- This model is trained for **label inference from report text only** — it does **not** process images.
- It should **not** be treated as a clinical decision support system.
- Performance is limited by the quality of the machine-translated training data.

## Citation

If you use this model, please cite the CTRate dataset and RadBERT:

```bibtex
@article{hamamci2024ctrate,
  title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology},
  author={Hamamci, Ibrahim Ethem and others},
  journal={arXiv preprint},
  year={2024}
}

@article{yan2022radbert,
  title={RadBERT: Adapting Transformer-based Language Models to Radiology},
  author={Yan, Di and others},
  journal={Radiology: Artificial Intelligence},
  year={2022}
}
```

## License

MIT