edloginovad commited on
Commit
751f591
·
verified ·
1 Parent(s): ba4a464

Update model card with detailed information

Browse files
Files changed (1) hide show
  1. README.md +149 -47
README.md CHANGED
@@ -1,70 +1,172 @@
1
  ---
2
- library_name: transformers
3
- language:
4
- - multilingual
5
  license: other
6
  base_model: DedalusHealthCare/tinybert-mlm-de
7
  tags:
8
- - generated_from_trainer
9
- datasets:
10
- - ner_demo_de
 
 
 
 
 
 
 
 
11
  model-index:
12
- - name: tinybert-demo-de
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- # tinybert-demo-de
 
 
 
20
 
21
- This model is a fine-tuned version of [DedalusHealthCare/tinybert-mlm-de](https://huggingface.co/DedalusHealthCare/tinybert-mlm-de) on the ner_demo_de dataset.
22
- It achieves the following results on the evaluation set:
23
- - Loss: 0.4069
24
- - Disorder Finding Precision: 0.25
25
- - Disorder Finding Recall: 0.1818
26
- - Disorder Finding F1: 0.2105
27
- - Disorder Finding Number: 11
28
- - Overall Precision: 0.25
29
- - Overall Recall: 0.1818
30
- - Overall F1: 0.2105
31
- - Overall Accuracy: 0.9286
32
 
33
- ## Model description
34
 
35
- More information needed
36
 
37
- ## Intended uses & limitations
 
 
 
 
38
 
39
- More information needed
40
 
41
- ## Training and evaluation data
 
 
 
42
 
43
- More information needed
44
 
45
- ## Training procedure
 
 
 
46
 
47
- ### Training hyperparameters
48
 
49
- The following hyperparameters were used during training:
50
- - learning_rate: 5e-05
51
- - train_batch_size: 32
52
- - eval_batch_size: 32
53
- - seed: 33
54
- - gradient_accumulation_steps: 2
55
- - total_train_batch_size: 64
56
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
- - lr_scheduler_type: linear
58
- - lr_scheduler_warmup_ratio: 0.1
59
- - num_epochs: 1
60
 
61
- ### Training results
 
 
 
 
 
 
 
 
 
62
 
 
63
 
 
64
 
65
- ### Framework versions
66
 
67
- - Transformers 4.45.1
68
- - Pytorch 2.6.0+cu124
69
- - Datasets 2.16.0
70
- - Tokenizers 0.20.3
 
1
  ---
 
 
 
2
  license: other
3
  base_model: DedalusHealthCare/tinybert-mlm-de
4
  tags:
5
+ - token-classification
6
+ - ner
7
+ - medical
8
+ - demo
9
+ - de
10
+ - pytorch
11
+ - transformers
12
+ language:
13
+ - de
14
+ pipeline_tag: token-classification
15
+ library_name: transformers
16
  model-index:
17
+ - name: TinyBERT for Demo NER
18
+ results:
19
+ - task:
20
+ type: token-classification
21
+ name: Named Entity Recognition
22
+ dataset:
23
+ type: demo
24
+ name: Demo Dataset
25
+ config: de
26
+ metrics:
27
+ - type: f1
28
+ value: # Will be updated after evaluation
29
+ name: F1 Score
30
+ - type: precision
31
+ value: # Will be updated after evaluation
32
+ name: Precision
33
+ - type: recall
34
+ value: # Will be updated after evaluation
35
+ name: Recall
36
  ---
37
 
38
+ # TinyBERT for Demo NER (DE)
39
+
40
+ ## Model Description
41
+
42
+ This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in German medical texts.
43
+
44
+ **Base Model**: DedalusHealthCare/tinybert-mlm-de
45
+
46
+ **Language**: German (de)
47
+
48
+ **Task**: Token Classification (NER)
49
+
50
+ **Entities**: DISORDER_FINDING
51
+
52
+ ## Training Details
53
+
54
+ ### Training Dataset
55
+
56
+ **Dataset**: `DedalusHealthCare/ner_demo_de@2025.10.16.13.40.41`
57
+
58
+ The model was trained on a versioned dataset with timestamp-based versioning for reproducibility.
59
+
60
+ ### Training Configuration
61
+ - **Training epochs**: 1
62
+ - **Learning rate**: 5e-05
63
+ - **Training batch size**: 32
64
+ - **Evaluation batch size**: 32
65
+ - **Max sequence length**: N/A
66
+ - **Warmup steps**: 0
67
+ - **Weight decay**: 0.01
68
+ - **Gradient accumulation steps**: 2
69
+ - **Mixed precision (FP16)**: False
70
+
71
+ ### Training Framework
72
+ - **Framework**: PyTorch with HuggingFace Transformers
73
+ - **Optimizer**: AdamW
74
+ - **Scheduler**: Linear with warmup
75
+
76
+ ## Usage
77
+
78
+ ### Quick Start with Pipeline
79
+
80
+ ```python
81
+ from transformers import pipeline
82
+
83
+ # Initialize the NER pipeline
84
+ ner_pipeline = pipeline(
85
+ "ner",
86
+ model="DedalusHealthCare/tinybert-demo-de",
87
+ tokenizer="DedalusHealthCare/tinybert-demo-de",
88
+ aggregation_strategy="simple"
89
+ )
90
+
91
+ # Example usage
92
+ text = "Your medical text here"
93
+ entities = ner_pipeline(text)
94
+ print(entities)
95
+ ```
96
+
97
+ ### Advanced Usage
98
+
99
+ ```python
100
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
101
+ import torch
102
+
103
+ # Load model and tokenizer
104
+ model_name = "DedalusHealthCare/tinybert-demo-de"
105
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
106
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
107
+
108
+ # Set model to evaluation mode
109
+ model.eval()
110
+
111
+ # Tokenize text
112
+ text = "Your medical text here"
113
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
114
+
115
+ # Get predictions
116
+ with torch.no_grad():
117
+ outputs = model(**inputs)
118
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
119
 
120
+ # Get predicted labels
121
+ predicted_token_class_ids = predictions.argmax(-1)
122
+ labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
123
+ ```
124
 
125
+ ## Model Performance
 
 
 
 
 
 
 
 
 
 
126
 
127
+ Performance metrics will be updated after evaluation on the validation set.
128
 
129
+ ## Intended Use
130
 
131
+ This model is specifically designed for:
132
+ - Named Entity Recognition in German medical texts
133
+ - Identification of DISORDER_FINDING entities
134
+ - Medical document processing and analysis
135
+ - Clinical NLP research and applications
136
 
137
+ ## Limitations
138
 
139
+ - Trained specifically for German medical texts
140
+ - Performance may vary on different medical domains or institutions
141
+ - May require domain adaptation for optimal performance on new datasets
142
+ - Subject to biases present in the training data
143
 
144
+ ## Ethical Considerations
145
 
146
+ - This model processes medical data and should be used responsibly
147
+ - All predictions should be validated by qualified medical professionals
148
+ - Patient privacy and data protection regulations must be followed
149
+ - The model may exhibit biases from the training data
150
 
151
+ ## Citation
152
 
153
+ If you use this model, please cite:
 
 
 
 
 
 
 
 
 
 
154
 
155
+ ```bibtex
156
+ @model{demo_de_ner_model,
157
+ title = {TinyBERT for Demo NER (DE)},
158
+ author = {DH Healthcare GmbH},
159
+ year = {2025},
160
+ publisher = {Hugging Face},
161
+ base_model = {DedalusHealthCare/tinybert-mlm-de},
162
+ url = {https://huggingface.co/DedalusHealthCare/tinybert-demo-de}
163
+ }
164
+ ```
165
 
166
+ ## License
167
 
168
+ This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.
169
 
170
+ ## Contact
171
 
172
+ For questions or support regarding this model, please contact DH Healthcare GmbH.