Hamnevise

Persian Diacritization using masked-language model(MLM) withHooshvareLab/bert-fa-base-uncased for the words with same written form but different spelling in a given sentence.

Architecture:

Input → ParsBERT (context) + Char CNN (morphology)
     → Shared Fusion
     → Word-Specific Classifiers
     → Only valid outputs per word

📊 Dataset Format

CSV file with three columns:

sentence,word,word_with_diacritics
اشکال در سیستم گرمایش، باعث سرد شدن ساختمان شد.,اشکال,اِشکال
اشکال در فرهنگ‌های باستانی، نمادها و معانی خاصی داشته‌اند.,اشکال,اَشکال

Trainings

v1.0 training stats:

Worst performing words

  • سمت — 82.52%
  • نکشن — 82.83%
  • نکشه — 84.38%
  • نکشم — 85.94%
  • بکشیمش — 87.50%

Epoch 15/15 - Train Loss: 0.0667, Train Acc: 0.9781 - Val Loss: 0.0974, Val Acc: 0.9755

🎉 Training complete! Best validation accuracy: 0.9770

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SadeghK/Hamnevise

Finetuned
(50)
this model