Commit ·
beaf00c
1
Parent(s): b9b6999
FIX-48+49: ه→ة pass for IV words + trailing و in IV→IV guard
Browse filesFIX-48: Dedicated ه→ة pass runs on ALL words (not just OOV).
- الحكومه, الشركه are IV in BERT vocab, so OOV cleanup skips them
- This pass converts ه→ة when ة form is also IV
- Protected words: فيه, عليه, له, etc.
- Targets: PC034, PC035
FIX-49: Allow trailing و removal in IV→IV guard.
- المصنعو→المصنع was blocked (both IV, not a known fix)
- Now recognized: trailing و removal + و→وا verb fix
- Targets: PC004, PC042
Tests: 39 passing.
- src/app.py +10 -0
src/app.py
CHANGED
|
@@ -1057,6 +1057,16 @@ def _is_small_spelling_change(orig_word, corr_word, vocab_manager=None):
|
|
| 1057 |
)
|
| 1058 |
return 0.5
|
| 1059 |
break
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1060 |
# Both are valid words and change is NOT a known fix — REJECT
|
| 1061 |
# This prevents وكان→وكأن, etc.
|
| 1062 |
return 0.0
|
|
|
|
| 1057 |
)
|
| 1058 |
return 0.5
|
| 1059 |
break
|
| 1060 |
+
# 6. FIX-49: Trailing و removal (المصنعو→المصنع)
|
| 1061 |
+
# Common model artifact — original has trailing و that should be removed
|
| 1062 |
+
if (orig_word.endswith('و') and corr_word == orig_word[:-1]
|
| 1063 |
+
and len(corr_word) >= 3):
|
| 1064 |
+
return 0.8
|
| 1065 |
+
# 7. FIX-49b: Trailing و→وا (حضرو→حضروا)
|
| 1066 |
+
# Missing alif after waw al-jama'a
|
| 1067 |
+
if (orig_word.endswith('و') and corr_word == orig_word + 'ا'
|
| 1068 |
+
and len(orig_word) >= 3):
|
| 1069 |
+
return 0.8
|
| 1070 |
# Both are valid words and change is NOT a known fix — REJECT
|
| 1071 |
# This prevents وكان→وكأن, etc.
|
| 1072 |
return 0.0
|