|
|
--- |
|
|
license: other |
|
|
base_model: DedalusHealthCare/tinybert-mlm-en |
|
|
datasets: |
|
|
- DedalusHealthCare/ner_demo_en |
|
|
task_categories: |
|
|
- token-classification |
|
|
task_ids: |
|
|
- named-entity-recognition |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- token-classification |
|
|
- ner |
|
|
- named-entity-recognition |
|
|
- en |
|
|
- disorder_finding |
|
|
library_name: transformers |
|
|
pipeline_tag: token-classification |
|
|
--- |
|
|
|
|
|
# TinyBERT for Demo NER (English) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in English medical texts. |
|
|
|
|
|
It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-en](https://huggingface.co/DedalusHealthCare/tinybert-mlm-en) masked language model using the [DedalusHealthCare/ner_demo_en](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_en) dataset. |
|
|
|
|
|
**Base Model**: [DedalusHealthCare/tinybert-mlm-en](https://huggingface.co/DedalusHealthCare/tinybert-mlm-en) |
|
|
|
|
|
**Training Dataset**: [DedalusHealthCare/ner_demo_en](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_en) |
|
|
|
|
|
**Task**: Token Classification (Named Entity Recognition) |
|
|
|
|
|
**Language**: English (en) |
|
|
|
|
|
**Entities**: DISORDER_FINDING |
|
|
|
|
|
**Model Format**: PYTORCH |
|
|
|
|
|
**Please use `max` as aggregation strategy in the NER pipeline (see example below)**. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training epochs**: 1 |
|
|
- **Learning rate**: 5e-05 |
|
|
- **Training batch size**: 32 |
|
|
- **Evaluation batch size**: 32 |
|
|
- **Max sequence length**: 256 |
|
|
- **Warmup ratio**: 0.1 |
|
|
- **Weight decay**: 0.01 |
|
|
- **FP16**: True |
|
|
- **Gradient accumulation steps**: 2 |
|
|
- **Save steps**: 50000 |
|
|
- **Evaluation steps**: 50000 |
|
|
- **Evaluation strategy**: steps |
|
|
- **Random seed**: 1 |
|
|
- **Label all tokens**: True |
|
|
- **Balanced training**: False |
|
|
- **Chunk mode**: sliding_window |
|
|
- **Stride**: 16 |
|
|
- **Max training samples**: None |
|
|
- **Max evaluation samples**: None |
|
|
- **Early stopping patience**: 0 |
|
|
- **Early stopping threshold**: 0.0 |
|
|
|
|
|
|
|
|
### Build Information |
|
|
- **Git Commit**: [9583c80](https://github.com/Dedalus-clinalytix/prod/commit/9583c80da9b9567b72c69d953854871a9badc139) |
|
|
|
|
|
## Use Case Configuration |
|
|
|
|
|
- **Use case name**: demo |
|
|
- **Language**: English (en) |
|
|
- **Target entities**: DISORDER_FINDING |
|
|
- **Text processing max length**: N/A |
|
|
- **Entity labeling scheme**: N/A |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Using Transformers Pipeline |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the model |
|
|
ner_pipeline = pipeline( |
|
|
"ner", |
|
|
model="DedalusHealthCare/tinybert-ner-demo-en", |
|
|
tokenizer="DedalusHealthCare/tinybert-ner-demo-en", |
|
|
aggregation_strategy="max" |
|
|
) |
|
|
|
|
|
# Example text |
|
|
text = "Der Patient hat Diabetes und Bluthochdruck." |
|
|
|
|
|
# Get predictions |
|
|
entities = ner_pipeline(text) |
|
|
print(entities) |
|
|
``` |
|
|
|
|
|
### Using AutoModel and AutoTokenizer |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "DedalusHealthCare/tinybert-ner-demo-en" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
|
|
|
# Tokenize text |
|
|
text = "Der Patient hat Diabetes und Bluthochdruck." |
|
|
tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
|
|
|
|
|
# Get predictions |
|
|
with torch.no_grad(): |
|
|
outputs = model(**tokens) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
|
|
|
# Get labels |
|
|
predicted_token_class_ids = predictions.argmax(-1) |
|
|
labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]] |
|
|
``` |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition. |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is intended for: |
|
|
- Named Entity Recognition in English medical texts |
|
|
- Identification of DISORDER_FINDING entities |
|
|
- Medical text processing and analysis |
|
|
- Research and development in medical NLP |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained specifically for English medical texts |
|
|
- Performance may vary on texts from different medical domains |
|
|
- May not generalize well to non-medical texts |
|
|
- Requires careful evaluation on new datasets |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- This model is trained on medical data and should be used responsibly |
|
|
- Outputs should be validated by medical professionals |
|
|
- Patient privacy and data protection regulations must be followed |
|
|
- The model may have biases present in the training data |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@model{demo_en_ner_model, |
|
|
title = {TinyBERT for Demo NER (English)}, |
|
|
author = {DH Healthcare GmbH}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-en} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is proprietary and owned by DH Healthcare GmbH. All rights reserved. |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or support, please contact DH Healthcare GmbH. |
|
|
|