Update model card with evaluation results from goldset from ner_disorderfinding_de_goldset
426fd64
verified
| license: other | |
| base_model: DedalusHealthCare/tinybert-mlm-de | |
| datasets: | |
| - DedalusHealthCare/ner_demo_de | |
| task_categories: | |
| - token-classification | |
| task_ids: | |
| - named-entity-recognition | |
| language: | |
| - de | |
| tags: | |
| - token-classification | |
| - ner | |
| - named-entity-recognition | |
| - de | |
| - disorder_finding | |
| library_name: transformers | |
| pipeline_tag: token-classification | |
| # TinyBERT for Demo NER (German) | |
| ## Model Description | |
| This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts. | |
| It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) masked language model using the [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de) dataset. | |
| **Base Model**: [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) | |
| **Training Dataset**: [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de) | |
| **Task**: Token Classification (Named Entity Recognition) | |
| **Language**: German (de) | |
| **Entities**: DISORDER_FINDING | |
| **Model Format**: PYTORCH+ONNX | |
| **Please use `max` as aggregation strategy in the NER pipeline (see example below)**. | |
| ## Training Details | |
| - **Training epochs**: 1 | |
| - **Learning rate**: N/A | |
| - **Training batch size**: 32 | |
| - **Evaluation batch size**: 32 | |
| - **Max sequence length**: 256 | |
| - **Warmup steps**: N/A | |
| - **FP16**: False | |
| - **Gradient accumulation steps**: 2 | |
| - **Evaluation accumulation steps**: 2 | |
| - **Save steps**: 15000 | |
| - **Evaluation steps**: 10000 | |
| - **Evaluation strategy**: steps | |
| - **Random seed**: 33 | |
| - **Label all tokens**: True | |
| - **Balanced training**: False | |
| - **Chunk mode**: sliding_window | |
| - **Stride**: 16 | |
| - **Max training samples**: None | |
| - **Max evaluation samples**: 10000 | |
| - **Early stopping patience**: 0 | |
| - **Early stopping threshold**: 0.0 | |
| ## Use Case Configuration | |
| - **Use case name**: demo | |
| - **Language**: German (de) | |
| - **Target entities**: DISORDER_FINDING | |
| - **Text processing max length**: N/A | |
| - **Entity labeling scheme**: N/A | |
| ## Usage | |
| ### Using Transformers Pipeline | |
| ```python | |
| from transformers import pipeline | |
| # Load the model | |
| ner_pipeline = pipeline( | |
| "ner", | |
| model="DedalusHealthCare/tinybert-ner-demo-de", | |
| tokenizer="DedalusHealthCare/tinybert-ner-demo-de", | |
| aggregation_strategy="max" | |
| ) | |
| # Example text | |
| text = "Der Patient hat Diabetes und Bluthochdruck." | |
| # Get predictions | |
| entities = ner_pipeline(text) | |
| print(entities) | |
| ``` | |
| ### Using AutoModel and AutoTokenizer | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForTokenClassification | |
| import torch | |
| # Load model and tokenizer | |
| model_name = "DedalusHealthCare/tinybert-ner-demo-de" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForTokenClassification.from_pretrained(model_name) | |
| # Tokenize text | |
| text = "Der Patient hat Diabetes und Bluthochdruck." | |
| tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True) | |
| # Get predictions | |
| with torch.no_grad(): | |
| outputs = model(**tokens) | |
| predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) | |
| # Get labels | |
| predicted_token_class_ids = predictions.argmax(-1) | |
| labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]] | |
| ``` | |
| ### Using ONNX Runtime (Optimized Inference) | |
| ```python | |
| from optimum.onnxruntime import ORTModelForTokenClassification | |
| from transformers import AutoTokenizer, pipeline | |
| import torch | |
| # Load ONNX model for faster inference | |
| model_name = "DedalusHealthCare/tinybert-ner-demo-de" | |
| onnx_model = ORTModelForTokenClassification.from_pretrained(model_name) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| # Create pipeline with ONNX model (recommended) | |
| ner_pipeline = pipeline( | |
| "ner", | |
| model=onnx_model, | |
| tokenizer=tokenizer, | |
| aggregation_strategy="max" | |
| ) | |
| # Example text | |
| text = "Der Patient hat Diabetes und Bluthochdruck." | |
| entities = ner_pipeline(text) | |
| print(entities) | |
| # Direct model usage | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) | |
| with torch.no_grad(): | |
| outputs = onnx_model(**inputs) | |
| predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) | |
| predicted_token_class_ids = predictions.argmax(-1) | |
| token_labels = [onnx_model.config.id2label[id.item()] for id in predicted_token_class_ids[0]] | |
| ``` | |
| ### Performance Comparison | |
| - **PyTorch**: Standard format, suitable for training and research | |
| - **ONNX**: Optimized for inference, typically 2-4x faster than PyTorch | |
| - **Recommendation**: Use ONNX for production inference, PyTorch for research | |
| ## Model Architecture | |
| This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition. | |
| ## Intended Use | |
| This model is intended for: | |
| - Named Entity Recognition in German medical texts | |
| - Identification of DISORDER_FINDING entities | |
| - Medical text processing and analysis | |
| - Research and development in medical NLP | |
| ## Limitations | |
| - Trained specifically for German medical texts | |
| - Performance may vary on texts from different medical domains | |
| - May not generalize well to non-medical texts | |
| - Requires careful evaluation on new datasets | |
| ## Ethical Considerations | |
| - This model is trained on medical data and should be used responsibly | |
| - Outputs should be validated by medical professionals | |
| - Patient privacy and data protection regulations must be followed | |
| - The model may have biases present in the training data | |
| ## Model Performance | |
| This model has been evaluated on the **goldset from ner_disorderfinding_de_goldset** using | |
| IO evaluation (sklearn, token level, lenient) with the following results: | |
| ### Overall Performance | |
| | Metric | Score | | |
| |--------|-------| | |
| | Precision (Macro) | 0.423825 | | |
| | Recall (Macro) | 0.467183 | | |
| | F1-Score (Macro) | 0.435170 | | |
| | Precision (Weighted) | 0.599471 | | |
| | Recall (Weighted) | 0.697989 | | |
| | F1-Score (Weighted) | 0.640426 | | |
| **Inference Performance**: 5.53 seconds for evaluation dataset | |
| ### Entity-Level Performance (IO Evaluation) | |
| | Entity Type | Precision | Recall | F1-Score | Support | | |
| |-------------|-----------|--------|----------|---------| | |
| | DISORDER_FINDING | 0.753533 | 0.900434 | 0.820460 | N/A | | |
| ### Evaluation Details | |
| - **Dataset**: goldset from ner_disorderfinding_de_goldset | |
| - **Dataset Source**: goldset | |
| - **Evaluation Date**: 2025-11-03 12:25:56 | |
| - **Language**: de | |
| - **Entities**: DISORDER_FINDING | |
| *This evaluation section is automatically generated and updated.* | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @model{demo_de_ner_model, | |
| title = {TinyBERT for Demo NER (German)}, | |
| author = {DH Healthcare GmbH}, | |
| year = {2025}, | |
| publisher = {Hugging Face}, | |
| url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-de} | |
| } | |
| ``` | |
| ## License | |
| This model is proprietary and owned by DH Healthcare GmbH. All rights reserved. | |
| ## Contact | |
| For questions or support, please contact DH Healthcare GmbH. | |