edloginovad's picture
Update model card with evaluation results from goldset from ner_disorderfinding_de_goldset
426fd64 verified
---
license: other
base_model: DedalusHealthCare/tinybert-mlm-de
datasets:
- DedalusHealthCare/ner_demo_de
task_categories:
- token-classification
task_ids:
- named-entity-recognition
language:
- de
tags:
- token-classification
- ner
- named-entity-recognition
- de
- disorder_finding
library_name: transformers
pipeline_tag: token-classification
---
# TinyBERT for Demo NER (German)
## Model Description
This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts.
It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) masked language model using the [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de) dataset.
**Base Model**: [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de)
**Training Dataset**: [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de)
**Task**: Token Classification (Named Entity Recognition)
**Language**: German (de)
**Entities**: DISORDER_FINDING
**Model Format**: PYTORCH+ONNX
**Please use `max` as aggregation strategy in the NER pipeline (see example below)**.
## Training Details
- **Training epochs**: 1
- **Learning rate**: N/A
- **Training batch size**: 32
- **Evaluation batch size**: 32
- **Max sequence length**: 256
- **Warmup steps**: N/A
- **FP16**: False
- **Gradient accumulation steps**: 2
- **Evaluation accumulation steps**: 2
- **Save steps**: 15000
- **Evaluation steps**: 10000
- **Evaluation strategy**: steps
- **Random seed**: 33
- **Label all tokens**: True
- **Balanced training**: False
- **Chunk mode**: sliding_window
- **Stride**: 16
- **Max training samples**: None
- **Max evaluation samples**: 10000
- **Early stopping patience**: 0
- **Early stopping threshold**: 0.0
## Use Case Configuration
- **Use case name**: demo
- **Language**: German (de)
- **Target entities**: DISORDER_FINDING
- **Text processing max length**: N/A
- **Entity labeling scheme**: N/A
## Usage
### Using Transformers Pipeline
```python
from transformers import pipeline
# Load the model
ner_pipeline = pipeline(
"ner",
model="DedalusHealthCare/tinybert-ner-demo-de",
tokenizer="DedalusHealthCare/tinybert-ner-demo-de",
aggregation_strategy="max"
)
# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
# Get predictions
entities = ner_pipeline(text)
print(entities)
```
### Using AutoModel and AutoTokenizer
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Tokenize text
text = "Der Patient hat Diabetes und Bluthochdruck."
tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**tokens)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get labels
predicted_token_class_ids = predictions.argmax(-1)
labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
```
### Using ONNX Runtime (Optimized Inference)
```python
from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline
import torch
# Load ONNX model for faster inference
model_name = "DedalusHealthCare/tinybert-ner-demo-de"
onnx_model = ORTModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create pipeline with ONNX model (recommended)
ner_pipeline = pipeline(
"ner",
model=onnx_model,
tokenizer=tokenizer,
aggregation_strategy="max"
)
# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
entities = ner_pipeline(text)
print(entities)
# Direct model usage
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = onnx_model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_token_class_ids = predictions.argmax(-1)
token_labels = [onnx_model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
```
### Performance Comparison
- **PyTorch**: Standard format, suitable for training and research
- **ONNX**: Optimized for inference, typically 2-4x faster than PyTorch
- **Recommendation**: Use ONNX for production inference, PyTorch for research
## Model Architecture
This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.
## Intended Use
This model is intended for:
- Named Entity Recognition in German medical texts
- Identification of DISORDER_FINDING entities
- Medical text processing and analysis
- Research and development in medical NLP
## Limitations
- Trained specifically for German medical texts
- Performance may vary on texts from different medical domains
- May not generalize well to non-medical texts
- Requires careful evaluation on new datasets
## Ethical Considerations
- This model is trained on medical data and should be used responsibly
- Outputs should be validated by medical professionals
- Patient privacy and data protection regulations must be followed
- The model may have biases present in the training data
## Model Performance
This model has been evaluated on the **goldset from ner_disorderfinding_de_goldset** using
IO evaluation (sklearn, token level, lenient) with the following results:
### Overall Performance
| Metric | Score |
|--------|-------|
| Precision (Macro) | 0.423825 |
| Recall (Macro) | 0.467183 |
| F1-Score (Macro) | 0.435170 |
| Precision (Weighted) | 0.599471 |
| Recall (Weighted) | 0.697989 |
| F1-Score (Weighted) | 0.640426 |
**Inference Performance**: 5.53 seconds for evaluation dataset
### Entity-Level Performance (IO Evaluation)
| Entity Type | Precision | Recall | F1-Score | Support |
|-------------|-----------|--------|----------|---------|
| DISORDER_FINDING | 0.753533 | 0.900434 | 0.820460 | N/A |
### Evaluation Details
- **Dataset**: goldset from ner_disorderfinding_de_goldset
- **Dataset Source**: goldset
- **Evaluation Date**: 2025-11-03 12:25:56
- **Language**: de
- **Entities**: DISORDER_FINDING
*This evaluation section is automatically generated and updated.*
## Citation
If you use this model, please cite:
```bibtex
@model{demo_de_ner_model,
title = {TinyBERT for Demo NER (German)},
author = {DH Healthcare GmbH},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-de}
}
```
## License
This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.
## Contact
For questions or support, please contact DH Healthcare GmbH.