--- language: - de license: mit library_name: transformers pipeline_tag: text-classification tags: - radiology - medical-imaging - chest-ct - multi-label-classification - radbert - german - ctrate base_model: zzxslp/RadBERT-RoBERTa-4m --- # RadBERT German CTRate Classifier A **RadBERT**-based multi-label classifier for predicting 18 pathology labels from **German-language** radiology reports. The training data consists of German-translated reports from the [CTRate](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) dataset, translated using Qwen 3.5 9B. ## Model Details | Property | Value | |---|---| | **Base model** | RadBERT (RoBERTa-base architecture, pre-trained on radiology text) | | **Task** | Multi-label text classification (18 labels) | | **Language** | German (`de`) | | **Framework** | 🤗 Transformers + PyTorch | | **Problem type** | `multi_label_classification` | ## Labels (18 pathologies) | ID | Label | |----|-------| | 0 | Medical material | | 1 | Arterial wall calcification | | 2 | Cardiomegaly | | 3 | Pericardial effusion | | 4 | Coronary artery wall calcification | | 5 | Hiatal hernia | | 6 | Lymphadenopathy | | 7 | Emphysema | | 8 | Atelectasis | | 9 | Lung nodule | | 10 | Lung opacity | | 11 | Pulmonary fibrotic sequela | | 12 | Pleural effusion | | 13 | Mosaic attenuation pattern | | 14 | Peribronchial thickening | | 15 | Consolidation | | 16 | Bronchiectasis | | 17 | Interlobular septal thickening | ## Quick Start ### Installation ```bash pip install transformers torch ``` ### Loading the model ```python from transformers import AutoTokenizer, AutoConfig from modeling_radbert import RadBertForSequenceClassification import torch repo_id = "suitch/radbert-german-ctrate-classifier" # Download the custom model class (or copy modeling_radbert.py locally) from huggingface_hub import hf_hub_download import sys, os modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py") sys.path.insert(0, os.path.dirname(modeling_path)) # Load config, model, and tokenizer config = AutoConfig.from_pretrained(repo_id) model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config) tokenizer = AutoTokenizer.from_pretrained(repo_id) model.eval() ``` ### Inference example ```python text = "Das Herz ist leicht vergrößert. Es zeigt sich ein kleiner Pleuraerguss links." inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) with torch.no_grad(): logits = model(**inputs) probabilities = torch.sigmoid(logits).squeeze() threshold = 0.5 predicted_labels = [ config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold ] print("Predicted labels:", predicted_labels) print("Probabilities:") for i, p in enumerate(probabilities): print(f" {config.id2label[i]}: {p:.4f}") ``` ## Training Details - **Base checkpoint**: RadBERT (RoBERTa-base weights pre-trained on radiology corpora) - **Training data**: German translations of CTRate radiology reports (translated with Qwen 2.5 9B) - **Classification head**: Linear layer on top of the `[CLS]` / pooler output - **Loss**: Binary Cross-Entropy with Logits (per-label sigmoid) ## Limitations - This model is trained for **label inference from report text only** — it does **not** process images. - It should **not** be treated as a clinical decision support system. - Performance is limited by the quality of the machine-translated training data. ## Citation If you use this model, please cite the CTRate dataset and RadBERT: ```bibtex @article{hamamci2024ctrate, title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology}, author={Hamamci, Ibrahim Ethem and others}, journal={arXiv preprint}, year={2024} } @article{yan2022radbert, title={RadBERT: Adapting Transformer-based Language Models to Radiology}, author={Yan, Di and others}, journal={Radiology: Artificial Intelligence}, year={2022} } ``` ## License MIT