MedTextBERT / README.md

annaadar

update readme usage section

b605c46 verified about 1 month ago

preview code

raw

history blame contribute delete

1.84 kB

metadata

license: cc-by-nc-4.0
language: he
base_model: onlplab/alephbert-base
tags:
  - text-classification
  - hebrew
  - medical

MedTextBERT

A Hebrew medical document classifier fine-tuned on AlephBERT.
Classifies extracted text into 24 document categories covering a wide range of medical specialties.

Built as part of a privacy-first Android app that performs 100% offline OCR on Hebrew medical documents.

Performance

Metric	Score
Accuracy	93.8%
F1	93.75%

Evaluated on a held-out test set after 20 epochs of fine-tuning.

Training Data

Fine-tuned on a synthetically generated dataset of 4,500+ labeled Hebrew medical documents, covering edge cases and category variations to improve generalization across real-world formats.

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="annaadar/MedTextBERT",
    tokenizer="annaadar/MedTextBERT"
)

result = classifier("לאחר בדיקת דם שגרתית, נמצאו ערכים תקינים")
print(result)

Limitations

Trained on synthetic data — performance on real-world clinical documents may vary
Designed for Hebrew text only
Not validated for clinical or diagnostic use

Intended Use

Research and portfolio purposes only.
Not intended for clinical or commercial use.
License: CC BY-NC 4.0

annaadar
/

MedTextBERT