Update README.md

bd91392 verified about 2 months ago

4 kB

	---
	language:
	- de
	license: mit
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- radiology
	- medical-imaging
	- chest-ct
	- multi-label-classification
	- radbert
	- german
	- ctrate
	base_model: zzxslp/RadBERT-RoBERTa-4m
	---

	# RadBERT German CTRate Classifier

	A RadBERT-based multi-label classifier for predicting 18 pathology labels from German-language radiology reports.
	The training data consists of German-translated reports from the [CTRate](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) dataset, translated using Qwen 3.5 9B.

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| RadBERT (RoBERTa-base architecture, pre-trained on radiology text) \|
	\| Task \| Multi-label text classification (18 labels) \|
	\| Language \| German (`de`) \|
	\| Framework \| 🤗 Transformers + PyTorch \|
	\| Problem type \| `multi_label_classification` \|

	## Labels (18 pathologies)

	\| ID \| Label \|
	\|----\|-------\|
	\| 0 \| Medical material \|
	\| 1 \| Arterial wall calcification \|
	\| 2 \| Cardiomegaly \|
	\| 3 \| Pericardial effusion \|
	\| 4 \| Coronary artery wall calcification \|
	\| 5 \| Hiatal hernia \|
	\| 6 \| Lymphadenopathy \|
	\| 7 \| Emphysema \|
	\| 8 \| Atelectasis \|
	\| 9 \| Lung nodule \|
	\| 10 \| Lung opacity \|
	\| 11 \| Pulmonary fibrotic sequela \|
	\| 12 \| Pleural effusion \|
	\| 13 \| Mosaic attenuation pattern \|
	\| 14 \| Peribronchial thickening \|
	\| 15 \| Consolidation \|
	\| 16 \| Bronchiectasis \|
	\| 17 \| Interlobular septal thickening \|

	## Quick Start

	### Installation

	```bash
	pip install transformers torch
	```

	### Loading the model

	```python
	from transformers import AutoTokenizer, AutoConfig
	from modeling_radbert import RadBertForSequenceClassification
	import torch

	repo_id = "suitch/radbert-german-ctrate-classifier"

	# Download the custom model class (or copy modeling_radbert.py locally)
	from huggingface_hub import hf_hub_download
	import sys, os

	modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling_radbert.py")
	sys.path.insert(0, os.path.dirname(modeling_path))

	# Load config, model, and tokenizer
	config = AutoConfig.from_pretrained(repo_id)
	model = RadBertForSequenceClassification.from_pretrained(repo_id, config=config)
	tokenizer = AutoTokenizer.from_pretrained(repo_id)

	model.eval()
	```

	### Inference example

	```python
	text = "Das Herz ist leicht vergrößert. Es zeigt sich ein kleiner Pleuraerguss links."

	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

	with torch.no_grad():
	logits = model(**inputs)

	probabilities = torch.sigmoid(logits).squeeze()
	threshold = 0.5
	predicted_labels = [
	config.id2label[i] for i, p in enumerate(probabilities) if p >= threshold
	]

	print("Predicted labels:", predicted_labels)
	print("Probabilities:")
	for i, p in enumerate(probabilities):
	print(f" {config.id2label[i]}: {p:.4f}")
	```

	## Training Details

	- Base checkpoint: RadBERT (RoBERTa-base weights pre-trained on radiology corpora)
	- Training data: German translations of CTRate radiology reports (translated with Qwen 2.5 9B)
	- Classification head: Linear layer on top of the `[CLS]` / pooler output
	- Loss: Binary Cross-Entropy with Logits (per-label sigmoid)

	## Limitations

	- This model is trained for label inference from report text only — it does not process images.
	- It should not be treated as a clinical decision support system.
	- Performance is limited by the quality of the machine-translated training data.

	## Citation

	If you use this model, please cite the CTRate dataset and RadBERT:

	```bibtex
	@article{hamamci2024ctrate,
	title={CT-RATE: A Large-Scale Computed Tomography Report-Image Dataset for AI in Radiology},
	author={Hamamci, Ibrahim Ethem and others},
	journal={arXiv preprint},
	year={2024}
	}

	@article{yan2022radbert,
	title={RadBERT: Adapting Transformer-based Language Models to Radiology},
	author={Yan, Di and others},
	journal={Radiology: Artificial Intelligence},
	year={2022}
	}
	```

	## License

	MIT