--- license: apache-2.0 language: - fa metrics: - accuracy base_model: - HooshvareLab/bert-fa-base-uncased --- # Hamnevise Persian Diacritization using masked-language model(MLM) with`HooshvareLab/bert-fa-base-uncased` for the words with same written form but different spelling in a given sentence. **Architecture:** ``` Input → ParsBERT (context) + Char CNN (morphology) → Shared Fusion → Word-Specific Classifiers → Only valid outputs per word ``` ## 📊 Dataset Format CSV file with three columns: ```csv sentence,word,word_with_diacritics اشکال در سیستم گرمایش، باعث سرد شدن ساختمان شد.,اشکال,اِشکال اشکال در فرهنگ‌های باستانی، نمادها و معانی خاصی داشته‌اند.,اشکال,اَشکال ``` ## Trainings v1.0 training stats: Worst performing words - سمت — 82.52% - نکشن — 82.83% - نکشه — 84.38% - نکشم — 85.94% - بکشیمش — 87.50% Epoch 15/15 - Train Loss: 0.0667, Train Acc: 0.9781 - Val Loss: 0.0974, Val Acc: 0.9755 🎉 Training complete! Best validation accuracy: 0.9770