Update model card with info

42bfbab verified about 1 month ago

4.74 kB

	---
	license: other
	base_model: DedalusHealthCare/tinybert-mlm-en
	datasets:
	- DedalusHealthCare/ner_demo_en
	task_categories:
	- token-classification
	task_ids:
	- named-entity-recognition
	language:
	- en
	tags:
	- token-classification
	- ner
	- named-entity-recognition
	- en
	- disorder_finding
	library_name: transformers
	pipeline_tag: token-classification
	---

	# TinyBERT for Demo NER (English)

	## Model Description

	This model is a fine-tuned TinyBERT model for Named Entity Recognition (NER) of DISORDER_FINDING entities in English medical texts.

	It was fine-tuned from the [DedalusHealthCare/tinybert-mlm-en](https://huggingface.co/DedalusHealthCare/tinybert-mlm-en) masked language model using the [DedalusHealthCare/ner_demo_en](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_en) dataset.

	Base Model: [DedalusHealthCare/tinybert-mlm-en](https://huggingface.co/DedalusHealthCare/tinybert-mlm-en)

	Training Dataset: [DedalusHealthCare/ner_demo_en](https://huggingface.co/datasets/DedalusHealthCare/ner_demo_en)

	Task: Token Classification (Named Entity Recognition)

	Language: English (en)

	Entities: DISORDER_FINDING

	Model Format: PYTORCH

	Please use `max` as aggregation strategy in the NER pipeline (see example below).

	## Training Details

	- Training epochs: 1
	- Learning rate: 5e-05
	- Training batch size: 32
	- Evaluation batch size: 32
	- Max sequence length: 256
	- Warmup ratio: 0.1
	- Weight decay: 0.01
	- FP16: True
	- Gradient accumulation steps: 2
	- Save steps: 50000
	- Evaluation steps: 50000
	- Evaluation strategy: steps
	- Random seed: 1
	- Label all tokens: True
	- Balanced training: False
	- Chunk mode: sliding_window
	- Stride: 16
	- Max training samples: None
	- Max evaluation samples: None
	- Early stopping patience: 0
	- Early stopping threshold: 0.0


	### Build Information
	- Git Commit: [9583c80](https://github.com/Dedalus-clinalytix/prod/commit/9583c80da9b9567b72c69d953854871a9badc139)

	## Use Case Configuration

	- Use case name: demo
	- Language: English (en)
	- Target entities: DISORDER_FINDING
	- Text processing max length: N/A
	- Entity labeling scheme: N/A

	## Usage

	### Using Transformers Pipeline

	```python
	from transformers import pipeline

	# Load the model
	ner_pipeline = pipeline(
	"ner",
	model="DedalusHealthCare/tinybert-ner-demo-en",
	tokenizer="DedalusHealthCare/tinybert-ner-demo-en",
	aggregation_strategy="max"
	)

	# Example text
	text = "Der Patient hat Diabetes und Bluthochdruck."

	# Get predictions
	entities = ner_pipeline(text)
	print(entities)
	```

	### Using AutoModel and AutoTokenizer

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	# Load model and tokenizer
	model_name = "DedalusHealthCare/tinybert-ner-demo-en"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	# Tokenize text
	text = "Der Patient hat Diabetes und Bluthochdruck."
	tokens = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

	# Get predictions
	with torch.no_grad():
	outputs = model(**tokens)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

	# Get labels
	predicted_token_class_ids = predictions.argmax(-1)
	labels = [model.config.id2label[id.item()] for id in predicted_token_class_ids[0]]
	```

	## Model Architecture

	This model is based on the TinyBERT architecture with a token classification head for Named Entity Recognition.

	## Intended Use

	This model is intended for:
	- Named Entity Recognition in English medical texts
	- Identification of DISORDER_FINDING entities
	- Medical text processing and analysis
	- Research and development in medical NLP

	## Limitations

	- Trained specifically for English medical texts
	- Performance may vary on texts from different medical domains
	- May not generalize well to non-medical texts
	- Requires careful evaluation on new datasets

	## Ethical Considerations

	- This model is trained on medical data and should be used responsibly
	- Outputs should be validated by medical professionals
	- Patient privacy and data protection regulations must be followed
	- The model may have biases present in the training data

	## Citation

	If you use this model, please cite:

	```bibtex
	@model{demo_en_ner_model,
	title = {TinyBERT for Demo NER (English)},
	author = {DH Healthcare GmbH},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/DedalusHealthCare/tinybert-ner-demo-en}
	}
	```

	## License

	This model is proprietary and owned by DH Healthcare GmbH. All rights reserved.

	## Contact

	For questions or support, please contact DH Healthcare GmbH.