| # BAYAN v2.0 — Task List |
|
|
| ## Phase A: Test Infrastructure ✅ |
| - [x] Create `tests/v2/test_level1_raw.py` — Raw model tests with TP/FP/FN/TN verdicts |
| - [x] Create `tests/v2/test_level2_solo.py` — Solo API endpoint tests |
| - [x] Create `tests/v2/test_level3_integrated.py` — Full pipeline tests |
| - [x] Create `tests/v2/benchmark_matrix.py` — Master comparison runner |
| - [x] Fix verdict logic (strip terminal punctuation before comparison) |
| - [x] Run baseline on entities + spelling datasets |
| - [ ] Run full 320-test baseline across all 3 levels |
|
|
| ## Phase A.1: Project Cleanup ✅ |
| - [x] Archive legacy scripts (AraSpell.py, Grammer_Rules.py, PuncAra.py) |
| - [x] Archive 36 old phase/verification reports |
| - [x] Archive 23 old test files + 8 phase10 helpers |
| - [x] Delete 35 orphaned debug/temp files |
| - [x] Fix .gitignore corruption (binary null bytes) |
| - [x] Fix PROJECT_DESCRIPTION.md stale reference |
| - [x] Archive docs/audit + docs/audits |
|
|
| ## Phase B: Extract Stages (NOT STARTED) |
| - [ ] Create `src/nlp/stages/spelling_stage.py` |
| - [ ] Create `src/nlp/stages/grammar_stage.py` |
| - [ ] Create `src/nlp/stages/punctuation_stage.py` |
| - [ ] Each stage wraps: model call → filter → verdict |
| - [ ] Hash (comment out) old inline stage code in `app.py` |
| - [ ] Re-run v2 benchmark → must match Phase A baseline |
|
|
| ## Phase C: Extract Filters (NOT STARTED) |
| - [ ] Create `src/nlp/filters/` module |
| - [ ] Extract overlap resolution, religious guard, entity guard |
| - [ ] Hash old filter code in `app.py` |
| - [ ] Re-run v2 benchmark → must match baseline |
|
|
| ## Phase D: Extract Preprocessors (NOT STARTED) |
| - [ ] Create `src/nlp/preprocessors/` module |
| - [ ] Extract text normalization, diacritic handling, chunk splitting |
| - [ ] Hash old preprocessor code in `app.py` |
| - [ ] Re-run v2 benchmark → must match baseline |
|
|
| ## Phase E: Create Pipeline Orchestrator (NOT STARTED) |
| - [ ] Create `src/nlp/pipeline.py` — orchestrates stages via PipelineContext |
| - [ ] Wire `app.py` /api/analyze to use `pipeline.run(text)` |
| - [ ] Hash old monolithic analyze code in `app.py` |
| - [ ] Re-run v2 benchmark → must match baseline |
|
|
| ## Phase F: Clean app.py (NOT STARTED) |
| - [ ] Move helpers (get_word_positions, OffsetMapper, etc.) to utility modules |
| - [ ] Remove all hashed (commented) code blocks |
| - [ ] app.py should only contain: Flask routes + pipeline.run() calls |
| - [ ] Final v2 benchmark → must match baseline |
| - [ ] Target: app.py < 500 lines |
|
|