annaadar
/

MedTextBERT

Text Classification

Model card Files Files and versions

MedTextBERT / README.md

annaadar's picture

update readme usage section

b605c46 verified about 1 month ago

|

history blame contribute delete

1.84 kB

	---
	license: cc-by-nc-4.0
	language: he
	base_model: onlplab/alephbert-base
	tags:
	- text-classification
	- hebrew
	- medical
	---

	# MedTextBERT

	A Hebrew medical document classifier fine-tuned on [AlephBERT](https://huggingface.co/onlplab/alephbert-base).
	Classifies extracted text into 24 document categories covering a wide range of medical specialties.

	Built as part of a privacy-first Android app that performs 100% offline OCR on Hebrew medical documents.

	## Performance

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 93.8% \|
	\| F1 \| 93.75% \|

	Evaluated on a held-out test set after 20 epochs of fine-tuning.

	## Categories

	`family_medicine` `cardiology` `cardiology_procedures` `imaging`
	`diabetes_endocrinology` `pathology` `pediatrics` `orthopedics`
	`neurology` `psychiatry` `urology` `surgery` `gastroenterology`
	`hematology` `pulmonology` `dermatology` `infections_inflammation`
	`gynecology` `oncology` `pharmacy` `emergency_medicine`
	`geriatrics_rehabilitation` `administration_general` `lab_results`

	## Training Data

	Fine-tuned on a synthetically generated dataset of 4,500+ labeled Hebrew medical documents,
	covering edge cases and category variations to improve generalization across real-world formats.

	## Usage

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="annaadar/MedTextBERT",
	tokenizer="annaadar/MedTextBERT"
	)

	result = classifier("לאחר בדיקת דם שגרתית, נמצאו ערכים תקינים")
	print(result)
	```

	## Limitations

	- Trained on synthetic data — performance on real-world clinical documents may vary
	- Designed for Hebrew text only
	- Not validated for clinical or diagnostic use

	## Intended Use

	Research and portfolio purposes only.
	Not intended for clinical or commercial use.
	License: CC BY-NC 4.0