edloginovad's picture
Update model card with info
42bfbab verified
---
license: other
base_model: DedalusHealthCare/tinybert-mlm-en
datasets:
- DedalusHealthCare/ner_demo_en
task_categories:
- token-classification
task_ids:
- named-entity-recognition
language:
- en
tags:
- token-classification
- ner
- named-entity-recognition
- en
- disorder_finding
library_name: transformers
pipeline_tag: token-classification
---
# TinyBERT for Demo NER (English)
## Model Description
This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in English medical texts.
It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-en](https://huggingface.co/DedalusHealthCare/tinybert-mlm-en) masked language model using the [DedalusHealthCare/ner_demo_en](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_en) dataset.
**Base Model**: [DedalusHealthCare/tinybert-mlm-en](https://huggingface.co/DedalusHealthCare/tinybert-mlm-en)
**Training Dataset**: [DedalusHealthCare/ner_demo_en](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_en)
**Task**: Token Classification (Named Entity Recognition)
**Language**: English (en)
**Entities**: DISORDER_FINDING
**Model Format**: PYTORCH
**Please use `max` as aggregation strategy in the NER pipeline (see example below)**.
## Training Details
- **Training epochs**: 1
- **Learning rate**: 5e-05
- **Training batch size**: 32
- **Evaluation batch size**: 32
- **Max sequence length**: 256
- **Warmup ratio**: 0.1
- **Weight decay**: 0.01
- **FP16**: True
- **Gradient accumulation steps**: 2
- **Save steps**: 50000
- **Evaluation steps**: 50000
- **Evaluation strategy**: steps
- **Random seed**: 1
- **Label all tokens**: True
- **Balanced training**: False
- **Chunk mode**: sliding_window
- **Stride**: 16
- **Max training samples**: None
- **Max evaluation samples**: None
- **Early stopping patience**: 0
- **Early stopping threshold**: 0.0
### Build Information
- **Git Commit**: [9583c80](https://github.com/Dedalus-clinalytix/prod/commit/9583c80da9b9567b72c69d953854871a9badc139)
## Use Case Configuration
- **Use case name**: demo
- **Language**: English (en)
- **Target entities**: DISORDER_FINDING
- **Text processing max length**: N/A
- **Entity labeling scheme**: N/A
## Usage
### Using Transformers Pipeline
```python
from transformers import pipeline
# Load the model
ner_pipeline = pipeline(
"ner",
model="DedalusHealthCare/tinybert-ner-demo-en",
tokenizer="DedalusHealthCare/tinybert-ner-demo-en",
aggregation_strategy="max"
)
# Example text
text = "Der Patient hat Diabetes und Bluthochdruck."
# Get predictions
entities = ner_pipeline(text)
print(entities)
```
### Using AutoModel and AutoTokenizer
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "DedalusHealthCare/tinybert-ner-demo-en"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Tokenize text
text = "Der Patient hat Diabetes und Bluthochdruck."
tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**tokens)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get labels
predicted_token_class_ids = predictions.argmax(-1)
labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
```
## Model Architecture
This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.
## Intended Use
This model is intended for:
- Named Entity Recognition in English medical texts
- Identification of DISORDER_FINDING entities
- Medical text processing and analysis
- Research and development in medical NLP
## Limitations
- Trained specifically for English medical texts
- Performance may vary on texts from different medical domains
- May not generalize well to non-medical texts
- Requires careful evaluation on new datasets
## Ethical Considerations
- This model is trained on medical data and should be used responsibly
- Outputs should be validated by medical professionals
- Patient privacy and data protection regulations must be followed
- The model may have biases present in the training data
## Citation
If you use this model, please cite:
```bibtex
@model{demo_en_ner_model,
title = {TinyBERT for Demo NER (English)},
author = {DH Healthcare GmbH},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-en}
}
```
## License
This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.
## Contact
For questions or support, please contact DH Healthcare GmbH.