annaadar
/

MedTextBERT

Text Classification

Model card Files Files and versions

annaadar commited on May 3

Commit

8f0ceef

·

verified ·

1 Parent(s): 4f7deaa

Update README.md

Files changed (1) hide show

README.md +61 -3

README.md CHANGED Viewed

@@ -1,3 +1,61 @@
----
-license: apache-2.0
----

+---
+license: cc-by-nc-4.0
+language: he
+base_model: onlplab/alephbert-base
+tags:
+  - text-classification
+  - hebrew
+  - medical
+---
+# MedTextBERT
+A Hebrew medical document classifier fine-tuned on [AlephBERT](https://huggingface.co/onlplab/alephbert-base).
+Classifies extracted text into 24 document categories covering a wide range of medical specialties.
+Built as part of a privacy-first Android app that performs 100% offline OCR on Hebrew medical documents.
+## Performance
+| Metric | Score |
+|--------|-------|
+| Accuracy | 93.8% |
+| F1 | 93.75% |
+Evaluated on a held-out test set after 20 epochs of fine-tuning.
+## Categories
+`family_medicine` `cardiology` `cardiology_procedures` `imaging`
+`diabetes_endocrinology` `pathology` `pediatrics` `orthopedics`
+`neurology` `psychiatry` `urology` `surgery` `gastroenterology`
+`hematology` `pulmonology` `dermatology` `infections_inflammation`
+`gynecology` `oncology` `pharmacy` `emergency_medicine`
+`geriatrics_rehabilitation` `administration_general` `lab_results`
+## Training Data
+Fine-tuned on a synthetically generated dataset of 4,500+ labeled Hebrew medical documents,
+covering edge cases and category variations to improve generalization across real-world formats.
+## Usage
+```python
+from transformers import pipeline
+classifier = pipeline("text-classification", model="annaadar/MedTextBERT")
+result = classifier("לאחר בדיקת דם שגרתית, נמצאו ערכים תקינים")
+print(result)
+```
+## Limitations
+- Trained on synthetic data — performance on real-world clinical documents may vary
+- Designed for Hebrew text only
+- Not validated for clinical or diagnostic use
+## Intended Use
+Research and portfolio purposes only.
+Not intended for clinical or commercial use.
+License: CC BY-NC 4.0