--- license: other base_model: DedalusHealthCare/tinybert-mlm-de datasets: - DedalusHealthCare/ner_demo_de task_categories: - token-classification task_ids: - named-entity-recognition language: - de tags: - token-classification - ner - named-entity-recognition - de - disorder_finding library_name: transformers pipeline_tag: token-classification --- # TinyBERT for Demo NER (German) ## Model Description This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts. It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) masked language model using the [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de) dataset. **Base Model**: [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) **Training Dataset**: [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de) **Task**: Token Classification (Named Entity Recognition) **Language**: German (de) **Entities**: DISORDER_FINDING **Model Format**: PYTORCH+ONNX **Please use `max` as aggregation strategy in the NER pipeline (see example below)**. ## Training Details - **Training epochs**: 1 - **Learning rate**: N/A - **Training batch size**: 32 - **Evaluation batch size**: 32 - **Max sequence length**: 256 - **Warmup steps**: N/A - **FP16**: False - **Gradient accumulation steps**: 2 - **Evaluation accumulation steps**: 2 - **Save steps**: 15000 - **Evaluation steps**: 10000 - **Evaluation strategy**: steps - **Random seed**: 33 - **Label all tokens**: True - **Balanced training**: False - **Chunk mode**: sliding_window - **Stride**: 16 - **Max training samples**: None - **Max evaluation samples**: 10000 - **Early stopping patience**: 0 - **Early stopping threshold**: 0.0 ## Use Case Configuration - **Use case name**: demo - **Language**: German (de) - **Target entities**: DISORDER_FINDING - **Text processing max length**: N/A - **Entity labeling scheme**: N/A ## Usage ### Using Transformers Pipeline ```python from transformers import pipeline # Load the model ner_pipeline = pipeline( "ner", model="DedalusHealthCare/tinybert-ner-demo-de", tokenizer="DedalusHealthCare/tinybert-ner-demo-de", aggregation_strategy="max" ) # Example text text = "Der Patient hat Diabetes und Bluthochdruck." # Get predictions entities = ner_pipeline(text) print(entities) ``` ### Using AutoModel and AutoTokenizer ```python from transformers import AutoTokenizer, AutoModelForTokenClassification import torch # Load model and tokenizer model_name = "DedalusHealthCare/tinybert-ner-demo-de" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) # Tokenize text text = "Der Patient hat Diabetes und Bluthochdruck." tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True) # Get predictions with torch.no_grad(): outputs = model(**tokens) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) # Get labels predicted_token_class_ids = predictions.argmax(-1) labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]] ``` ### Using ONNX Runtime (Optimized Inference) ```python from optimum.onnxruntime import ORTModelForTokenClassification from transformers import AutoTokenizer, pipeline import torch # Load ONNX model for faster inference model_name = "DedalusHealthCare/tinybert-ner-demo-de" onnx_model = ORTModelForTokenClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Create pipeline with ONNX model (recommended) ner_pipeline = pipeline( "ner", model=onnx_model, tokenizer=tokenizer, aggregation_strategy="max" ) # Example text text = "Der Patient hat Diabetes und Bluthochdruck." entities = ner_pipeline(text) print(entities) # Direct model usage inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) with torch.no_grad(): outputs = onnx_model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_token_class_ids = predictions.argmax(-1) token_labels = [onnx_model.config.id2label[id.item()] for id in predicted_token_class_ids[0]] ``` ### Performance Comparison - **PyTorch**: Standard format, suitable for training and research - **ONNX**: Optimized for inference, typically 2-4x faster than PyTorch - **Recommendation**: Use ONNX for production inference, PyTorch for research ## Model Architecture This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition. ## Intended Use This model is intended for: - Named Entity Recognition in German medical texts - Identification of DISORDER_FINDING entities - Medical text processing and analysis - Research and development in medical NLP ## Limitations - Trained specifically for German medical texts - Performance may vary on texts from different medical domains - May not generalize well to non-medical texts - Requires careful evaluation on new datasets ## Ethical Considerations - This model is trained on medical data and should be used responsibly - Outputs should be validated by medical professionals - Patient privacy and data protection regulations must be followed - The model may have biases present in the training data ## Model Performance This model has been evaluated on the **goldset from ner_disorderfinding_de_goldset** using IO evaluation (sklearn, token level, lenient) with the following results: ### Overall Performance | Metric | Score | |--------|-------| | Precision (Macro) | 0.423825 | | Recall (Macro) | 0.467183 | | F1-Score (Macro) | 0.435170 | | Precision (Weighted) | 0.599471 | | Recall (Weighted) | 0.697989 | | F1-Score (Weighted) | 0.640426 | **Inference Performance**: 5.53 seconds for evaluation dataset ### Entity-Level Performance (IO Evaluation) | Entity Type | Precision | Recall | F1-Score | Support | |-------------|-----------|--------|----------|---------| | DISORDER_FINDING | 0.753533 | 0.900434 | 0.820460 | N/A | ### Evaluation Details - **Dataset**: goldset from ner_disorderfinding_de_goldset - **Dataset Source**: goldset - **Evaluation Date**: 2025-11-03 12:25:56 - **Language**: de - **Entities**: DISORDER_FINDING *This evaluation section is automatically generated and updated.* ## Citation If you use this model, please cite: ```bibtex @model{demo_de_ner_model, title = {TinyBERT for Demo NER (German)}, author = {DH Healthcare GmbH}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-de} } ``` ## License This model is proprietary and owned by DH Healthcare GmbH. All rights reserved. ## Contact For questions or support, please contact DH Healthcare GmbH.