File size: 6,930 Bytes

---
license: other
base_model: DedalusHealthCare/tinybert-mlm-de
datasets:
- DedalusHealthCare/ner_demo_de
task_categories:
- token-classification
task_ids:
- named-entity-recognition
language:
- de
tags:
- token-classification
- ner
- named-entity-recognition
- de
- disorder_finding
library_name: transformers
pipeline_tag: token-classification
---

# TinyBERT for Demo NER (German)

## Model Description

This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts.

It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) masked language model using the [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de) dataset.

**Base Model**: [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de)

**Training Dataset**: [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de)

**Task**: Token Classification (Named Entity Recognition)

**Language**: German (de)

**Entities**: DISORDER_FINDING

**Model Format**: PYTORCH+ONNX

**Please use `max` as aggregation strategy in the NER pipeline (see example below)**.

## Training Details

- **Training epochs**: 1
- **Learning rate**: N/A
- **Training batch size**: 32
- **Evaluation batch size**: 32
- **Max sequence length**: 256
- **Warmup steps**: N/A
- **FP16**: False
- **Gradient accumulation steps**: 2
- **Evaluation accumulation steps**: 2
- **Save steps**: 15000
- **Evaluation steps**: 10000
- **Evaluation strategy**: steps
- **Random seed**: 33
- **Label all tokens**: True
- **Balanced training**: False
- **Chunk mode**: sliding_window
- **Stride**: 16
- **Max training samples**: None
- **Max evaluation samples**: 10000
- **Early stopping patience**: 0
- **Early stopping threshold**: 0.0

## Use Case Configuration

- **Use case name**: demo
- **Language**: German (de)
- **Target entities**: DISORDER_FINDING
- **Text processing max length**: N/A
- **Entity labeling scheme**: N/A

## Usage

### Using Transformers Pipeline

```python
from transformers import pipeline

# Load the model
ner_pipeline = pipeline(
    "ner",
    model="DedalusHealthCare/tinybert-ner-demo-de",
    tokenizer="DedalusHealthCare/tinybert-ner-demo-de",
    aggregation_strategy="max"
)

# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."

# Get predictions
entities = ner_pipeline(text)
print(entities)
```

### Using AutoModel and AutoTokenizer

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Tokenize text
text = "Der Patient hat Diabetes und Bluthochdruck."
tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**tokens)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get labels
predicted_token_class_ids = predictions.argmax(-1)
labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
```

### Using ONNX Runtime (Optimized Inference)

```python
from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline
import torch

# Load ONNX model for faster inference
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
onnx_model = ORTModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create pipeline with ONNX model (recommended)
ner_pipeline = pipeline(
    "ner",
    model=onnx_model,
    tokenizer=tokenizer,
    aggregation_strategy="max"
)

# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
entities = ner_pipeline(text)
print(entities)

# Direct model usage
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = onnx_model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

predicted_token_class_ids = predictions.argmax(-1)
token_labels = [onnx_model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
```

### Performance Comparison

- **PyTorch**: Standard format, suitable for training and research
- **ONNX**: Optimized for inference, typically 2-4x faster than PyTorch
- **Recommendation**: Use ONNX for production inference, PyTorch for research

## Model Architecture

This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.

## Intended Use

This model is intended for:
- Named Entity Recognition in German medical texts
- Identification of DISORDER_FINDING entities
- Medical text processing and analysis
- Research and development in medical NLP

## Limitations

- Trained specifically for German medical texts
- Performance may vary on texts from different medical domains
- May not generalize well to non-medical texts
- Requires careful evaluation on new datasets

## Ethical Considerations

- This model is trained on medical data and should be used responsibly
- Outputs should be validated by medical professionals
- Patient privacy and data protection regulations must be followed
- The model may have biases present in the training data


## Model Performance

This model has been evaluated on the **goldset from ner_disorderfinding_de_goldset** using
IO evaluation (sklearn, token level, lenient) with the following results:

### Overall Performance

| Metric | Score |
|--------|-------|
| Precision (Macro) | 0.423825 |
| Recall (Macro) | 0.467183 |
| F1-Score (Macro) | 0.435170 |
| Precision (Weighted) | 0.599471 |
| Recall (Weighted) | 0.697989 |
| F1-Score (Weighted) | 0.640426 |

**Inference Performance**: 5.53 seconds for evaluation dataset

### Entity-Level Performance (IO Evaluation)

| Entity Type | Precision | Recall | F1-Score | Support |
|-------------|-----------|--------|----------|---------|
| DISORDER_FINDING | 0.753533 | 0.900434 | 0.820460 | N/A |

### Evaluation Details

- **Dataset**: goldset from ner_disorderfinding_de_goldset
- **Dataset Source**: goldset
- **Evaluation Date**: 2025-11-03 12:25:56
- **Language**: de
- **Entities**: DISORDER_FINDING

*This evaluation section is automatically generated and updated.*
## Citation

If you use this model, please cite:

```bibtex
@model{demo_de_ner_model,
  title = {TinyBERT for Demo NER (German)},
  author = {DH Healthcare GmbH},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-de}
}
```

## License

This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.

## Contact

For questions or support, please contact DH Healthcare GmbH.