Hamnevise
Persian Diacritization using masked-language model(MLM) withHooshvareLab/bert-fa-base-uncased for the words with same written form but different spelling in a given sentence.
Architecture:
Input → ParsBERT (context) + Char CNN (morphology)
→ Shared Fusion
→ Word-Specific Classifiers
→ Only valid outputs per word
📊 Dataset Format
CSV file with three columns:
sentence,word,word_with_diacritics
اشکال در سیستم گرمایش، باعث سرد شدن ساختمان شد.,اشکال,اِشکال
اشکال در فرهنگهای باستانی، نمادها و معانی خاصی داشتهاند.,اشکال,اَشکال
Trainings
v1.0 training stats:
Worst performing words
- سمت — 82.52%
- نکشن — 82.83%
- نکشه — 84.38%
- نکشم — 85.94%
- بکشیمش — 87.50%
Epoch 15/15 - Train Loss: 0.0667, Train Acc: 0.9781 - Val Loss: 0.0974, Val Acc: 0.9755
🎉 Training complete! Best validation accuracy: 0.9770
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for SadeghK/Hamnevise
Base model
HooshvareLab/bert-fa-base-uncased