TinyBERT for Demo NER (English)
Model Description
This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in English medical texts.
It was fine-tuned from the DedalusHealthCare/tinybert-mlm-en masked language model using the DedalusHealthCare/ner_demo_en dataset.
Base Model: DedalusHealthCare/tinybert-mlm-en
Training Dataset: DedalusHealthCare/ner_demo_en
Task: Token Classification (Named Entity Recognition)
Language: English (en)
Entities: DISORDER_FINDING
Model Format: PYTORCH
Please use max as aggregation strategy in the NER pipeline (see example below).
Training Details
- Training epochs: 1
- Learning rate: 5e-05
- Training batch size: 32
- Evaluation batch size: 32
- Max sequence length: 256
- Warmup ratio: 0.1
- Weight decay: 0.01
- FP16: True
- Gradient accumulation steps: 2
- Save steps: 50000
- Evaluation steps: 50000
- Evaluation strategy: steps
- Random seed: 1
- Label all tokens: True
- Balanced training: False
- Chunk mode: sliding_window
- Stride: 16
- Max training samples: None
- Max evaluation samples: None
- Early stopping patience: 0
- Early stopping threshold: 0.0
Build Information
- Git Commit: 9583c80
Use Case Configuration
- Use case name: demo
- Language: English (en)
- Target entities: DISORDER_FINDING
- Text processing max length: N/A
- Entity labeling scheme: N/A
Usage
Using Transformers Pipeline
from transformers import pipeline
# Load the model
ner_pipeline = pipeline(
"ner",
model="DedalusHealthCare/tinybert-ner-demo-en",
tokenizer="DedalusHealthCare/tinybert-ner-demo-en",
aggregation_strategy="max"
)
# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
# Get predictions
entities = ner_pipeline(text)
print(entities)
Using AutoModel and AutoTokenizer
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "DedalusHealthCare/tinybert-ner-demo-en"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Tokenize text
text = "Der Patient hat Diabetes und Bluthochdruck."
tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**tokens)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get labels
predicted_token_class_ids = predictions.argmax(-1)
labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
Model Architecture
This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.
Intended Use
This model is intended for:
- Named Entity Recognition in English medical texts
- Identification of DISORDER_FINDING entities
- Medical text processing and analysis
- Research and development in medical NLP
Limitations
- Trained specifically for English medical texts
- Performance may vary on texts from different medical domains
- May not generalize well to non-medical texts
- Requires careful evaluation on new datasets
Ethical Considerations
- This model is trained on medical data and should be used responsibly
- Outputs should be validated by medical professionals
- Patient privacy and data protection regulations must be followed
- The model may have biases present in the training data
Citation
If you use this model, please cite:
@model{demo_en_ner_model,
title = {TinyBERT for Demo NER (English)},
author = {DH Healthcare GmbH},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-en}
}
License
This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.
Contact
For questions or support, please contact DH Healthcare GmbH.
- Downloads last month
- 51
Model tree for DedalusHealthCare/tinybert-ner-demo-en
Base model
DedalusHealthCare/tinybert-mlm-en