youssefreda9 commited on
Commit
beaf00c
·
1 Parent(s): b9b6999

FIX-48+49: ه→ة pass for IV words + trailing و in IV→IV guard

Browse files

FIX-48: Dedicated ه→ة pass runs on ALL words (not just OOV).
- الحكومه, الشركه are IV in BERT vocab, so OOV cleanup skips them
- This pass converts ه→ة when ة form is also IV
- Protected words: فيه, عليه, له, etc.
- Targets: PC034, PC035

FIX-49: Allow trailing و removal in IV→IV guard.
- المصنعو→المصنع was blocked (both IV, not a known fix)
- Now recognized: trailing و removal + و→وا verb fix
- Targets: PC004, PC042

Tests: 39 passing.

Files changed (1) hide show
  1. src/app.py +10 -0
src/app.py CHANGED
@@ -1057,6 +1057,16 @@ def _is_small_spelling_change(orig_word, corr_word, vocab_manager=None):
1057
  )
1058
  return 0.5
1059
  break
 
 
 
 
 
 
 
 
 
 
1060
  # Both are valid words and change is NOT a known fix — REJECT
1061
  # This prevents وكان→وكأن, etc.
1062
  return 0.0
 
1057
  )
1058
  return 0.5
1059
  break
1060
+ # 6. FIX-49: Trailing و removal (المصنعو→المصنع)
1061
+ # Common model artifact — original has trailing و that should be removed
1062
+ if (orig_word.endswith('و') and corr_word == orig_word[:-1]
1063
+ and len(corr_word) >= 3):
1064
+ return 0.8
1065
+ # 7. FIX-49b: Trailing و→وا (حضرو→حضروا)
1066
+ # Missing alif after waw al-jama'a
1067
+ if (orig_word.endswith('و') and corr_word == orig_word + 'ا'
1068
+ and len(orig_word) >= 3):
1069
+ return 0.8
1070
  # Both are valid words and change is NOT a known fix — REJECT
1071
  # This prevents وكان→وكأن, etc.
1072
  return 0.0