|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- fa |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- HooshvareLab/bert-fa-base-uncased |
|
|
--- |
|
|
|
|
|
# Hamnevise |
|
|
Persian Diacritization using masked-language model(MLM) with`HooshvareLab/bert-fa-base-uncased` for the words with same written form but different spelling in a given sentence. |
|
|
|
|
|
**Architecture:** |
|
|
``` |
|
|
Input → ParsBERT (context) + Char CNN (morphology) |
|
|
→ Shared Fusion |
|
|
→ Word-Specific Classifiers |
|
|
→ Only valid outputs per word |
|
|
``` |
|
|
|
|
|
## 📊 Dataset Format |
|
|
CSV file with three columns: |
|
|
|
|
|
```csv |
|
|
sentence,word,word_with_diacritics |
|
|
اشکال در سیستم گرمایش، باعث سرد شدن ساختمان شد.,اشکال,اِشکال |
|
|
اشکال در فرهنگهای باستانی، نمادها و معانی خاصی داشتهاند.,اشکال,اَشکال |
|
|
``` |
|
|
|
|
|
|
|
|
## Trainings |
|
|
|
|
|
v1.0 training stats: |
|
|
|
|
|
Worst performing words |
|
|
- سمت — 82.52% |
|
|
- نکشن — 82.83% |
|
|
- نکشه — 84.38% |
|
|
- نکشم — 85.94% |
|
|
- بکشیمش — 87.50% |
|
|
|
|
|
Epoch 15/15 - Train Loss: 0.0667, Train Acc: 0.9781 - Val Loss: 0.0974, Val Acc: 0.9755 |
|
|
|
|
|
🎉 Training complete! Best validation accuracy: 0.9770 |
|
|
|