Hamnevise / README.md
SadeghK's picture
Update README.md
b3121c1 verified
---
license: apache-2.0
language:
- fa
metrics:
- accuracy
base_model:
- HooshvareLab/bert-fa-base-uncased
---
# Hamnevise
Persian Diacritization using masked-language model(MLM) with`HooshvareLab/bert-fa-base-uncased` for the words with same written form but different spelling in a given sentence.
**Architecture:**
```
Input → ParsBERT (context) + Char CNN (morphology)
→ Shared Fusion
→ Word-Specific Classifiers
→ Only valid outputs per word
```
## 📊 Dataset Format
CSV file with three columns:
```csv
sentence,word,word_with_diacritics
اشکال در سیستم گرمایش، باعث سرد شدن ساختمان شد.,اشکال,اِشکال
اشکال در فرهنگ‌های باستانی، نمادها و معانی خاصی داشته‌اند.,اشکال,اَشکال
```
## Trainings
v1.0 training stats:
Worst performing words
- سمت — 82.52%
- نکشن — 82.83%
- نکشه — 84.38%
- نکشم — 85.94%
- بکشیمش — 87.50%
Epoch 15/15 - Train Loss: 0.0667, Train Acc: 0.9781 - Val Loss: 0.0974, Val Acc: 0.9755
🎉 Training complete! Best validation accuracy: 0.9770