DedalusHealthCare
/

tinybert-demo-de

@@ -1,246 +1,56 @@
 ---
 license: other
 base_model: DedalusHealthCare/tinybert-mlm-de
-datasets:
-- DedalusHealthCare/ner_demo_de
-task_categories:
-- token-classification
-task_ids:
-- named-entity-recognition
-language:
-- de
 tags:
-- token-classification
-- ner
-- named-entity-recognition
-- de
-- disorder_finding
-library_name: transformers
-pipeline_tag: token-classification
 ---
-# TinyBERT for Demo NER (German)
-## Model Description
-This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts.
-It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) masked language model using the [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de) dataset.
-**Base Model**: [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de)
-**Training Dataset**: [DedalusHealthCare/ner_demo_de](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_de)
-**Task**: Token Classification (Named Entity Recognition)
-**Language**: German (de)
-**Entities**: DISORDER_FINDING
-**Model Format**: PYTORCH+ONNX
-**Please use `max` as aggregation strategy in the NER pipeline (see example below)**.
-## Training Details
-- **Training epochs**: 1
-- **Learning rate**: N/A
-- **Training batch size**: 32
-- **Evaluation batch size**: 32
-- **Max sequence length**: 256
-- **Warmup steps**: N/A
-- **FP16**: False
-- **Gradient accumulation steps**: 2
-- **Evaluation accumulation steps**: 2
-- **Save steps**: 15000
-- **Evaluation steps**: 10000
-- **Evaluation strategy**: steps
-- **Random seed**: 33
-- **Label all tokens**: True
-- **Balanced training**: False
-- **Chunk mode**: sliding_window
-- **Stride**: 16
-- **Max training samples**: None
-- **Max evaluation samples**: 10000
-- **Early stopping patience**: 0
-- **Early stopping threshold**: 0.0
-## Use Case Configuration
-- **Use case name**: demo
-- **Language**: German (de)
-- **Target entities**: DISORDER_FINDING
-- **Text processing max length**: N/A
-- **Entity labeling scheme**: N/A
-## Usage
-### Using Transformers Pipeline
-```python
-from transformers import pipeline
-# Load the model
-ner_pipeline = pipeline(
-    "ner",
-    model="DedalusHealthCare/tinybert-demo-de",
-    tokenizer="DedalusHealthCare/tinybert-demo-de",
-    aggregation_strategy="max"
-)
-# Example text
-text = "Der Patient hat Diabetes und Bluthochdruck."
-# Get predictions
-entities = ner_pipeline(text)
-print(entities)
-```
-### Using AutoModel and AutoTokenizer
-```python
-from transformers import AutoTokenizer, AutoModelForTokenClassification
-import torch
-# Load model and tokenizer
-model_name = "DedalusHealthCare/tinybert-demo-de"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForTokenClassification.from_pretrained(model_name)
-# Tokenize text
-text = "Der Patient hat Diabetes und Bluthochdruck."
-tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
-# Get predictions
-with torch.no_grad():
-    outputs = model(**tokens)
-    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
-# Get labels
-predicted_token_class_ids = predictions.argmax(-1)
-labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
-```
-### Using ONNX Runtime (Optimized Inference)
-```python
-from optimum.onnxruntime import ORTModelForTokenClassification
-from transformers import AutoTokenizer, pipeline
-import torch
-# Load ONNX model for faster inference
-model_name = "DedalusHealthCare/tinybert-demo-de"
-onnx_model = ORTModelForTokenClassification.from_pretrained(model_name)
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-# Create pipeline with ONNX model (recommended)
-ner_pipeline = pipeline(
-    "ner",
-    model=onnx_model,
-    tokenizer=tokenizer,
-    aggregation_strategy="max"
-)
-# Example text
-text = "Der Patient hat Diabetes und Bluthochdruck."
-entities = ner_pipeline(text)
-print(entities)
-# Direct model usage
-inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
-with torch.no_grad():
-    outputs = onnx_model(**inputs)
-    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
-predicted_token_class_ids = predictions.argmax(-1)
-token_labels = [onnx_model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
-```
-### Performance Comparison
-- **PyTorch**: Standard format, suitable for training and research
-- **ONNX**: Optimized for inference, typically 2-4x faster than PyTorch
-- **Recommendation**: Use ONNX for production inference, PyTorch for research
-## Model Architecture
-This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.
-## Intended Use
-This model is intended for:
-- Named Entity Recognition in German medical texts
-- Identification of DISORDER_FINDING entities
-- Medical text processing and analysis
-- Research and development in medical NLP
-## Limitations
-- Trained specifically for German medical texts
-- Performance may vary on texts from different medical domains
-- May not generalize well to non-medical texts
-- Requires careful evaluation on new datasets
-## Ethical Considerations
-- This model is trained on medical data and should be used responsibly
-- Outputs should be validated by medical professionals
-- Patient privacy and data protection regulations must be followed
-- The model may have biases present in the training data
-## Model Performance
-This model has been evaluated on the **goldset from ner_disorderfinding_de_goldset** using
-IO evaluation (sklearn, token level, lenient) with the following results:
-### Overall Performance
-| Metric | Score |
-|--------|-------|
-| Precision (Macro) | 0.425082 |
-| Recall (Macro) | 0.467785 |
-| F1-Score (Macro) | 0.435900 |
-| Precision (Weighted) | 0.600185 |
-| Recall (Weighted) | 0.698514 |
-| F1-Score (Weighted) | 0.640943 |
-**Inference Performance**: 5.65 seconds for evaluation dataset
-### Entity-Level Performance (IO Evaluation)
-| Entity Type | Precision | Recall | F1-Score | Support |
-|-------------|-----------|--------|----------|---------|
-| DISORDER_FINDING | 0.753771 | 0.900890 | 0.820790 | N/A |
-### Evaluation Details
-- **Dataset**: goldset from ner_disorderfinding_de_goldset
-- **Dataset Source**: goldset
-- **Evaluation Date**: 2025-09-25 09:38:17
-- **Language**: de
-- **Entities**: DISORDER_FINDING
-*This evaluation section is automatically generated and updated.*
-## Citation
-If you use this model, please cite:
-```bibtex
-@model{demo_de_ner_model,
-  title = {TinyBERT for Demo NER (German)},
-  author = {DH Healthcare GmbH},
-  year = {2025},
-  publisher = {Hugging Face},
-  url = {https://huggingface.co/DedalusHealthCare/tinybert-demo-de}
-}
-```
-## License
-This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.
-## Contact
-For questions or support, please contact DH Healthcare GmbH.

 ---
+library_name: transformers
 license: other
 base_model: DedalusHealthCare/tinybert-mlm-de
 tags:
+- generated_from_trainer
+model-index:
+- name: tinybert-demo-de
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# tinybert-demo-de
+This model is a fine-tuned version of [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) on the None dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 33
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+### Framework versions
+- Transformers 4.45.1
+- Pytorch 2.6.0+cu124
+- Datasets 2.16.0
+- Tokenizers 0.20.3