# BAYAN v2.0 — Task List ## Phase A: Test Infrastructure ✅ - [x] Create `tests/v2/test_level1_raw.py` — Raw model tests with TP/FP/FN/TN verdicts - [x] Create `tests/v2/test_level2_solo.py` — Solo API endpoint tests - [x] Create `tests/v2/test_level3_integrated.py` — Full pipeline tests - [x] Create `tests/v2/benchmark_matrix.py` — Master comparison runner - [x] Fix verdict logic (strip terminal punctuation before comparison) - [x] Run baseline on entities + spelling datasets - [ ] Run full 320-test baseline across all 3 levels ## Phase A.1: Project Cleanup ✅ - [x] Archive legacy scripts (AraSpell.py, Grammer_Rules.py, PuncAra.py) - [x] Archive 36 old phase/verification reports - [x] Archive 23 old test files + 8 phase10 helpers - [x] Delete 35 orphaned debug/temp files - [x] Fix .gitignore corruption (binary null bytes) - [x] Fix PROJECT_DESCRIPTION.md stale reference - [x] Archive docs/audit + docs/audits ## Phase B: Extract Stages (NOT STARTED) - [ ] Create `src/nlp/stages/spelling_stage.py` - [ ] Create `src/nlp/stages/grammar_stage.py` - [ ] Create `src/nlp/stages/punctuation_stage.py` - [ ] Each stage wraps: model call → filter → verdict - [ ] Hash (comment out) old inline stage code in `app.py` - [ ] Re-run v2 benchmark → must match Phase A baseline ## Phase C: Extract Filters (NOT STARTED) - [ ] Create `src/nlp/filters/` module - [ ] Extract overlap resolution, religious guard, entity guard - [ ] Hash old filter code in `app.py` - [ ] Re-run v2 benchmark → must match baseline ## Phase D: Extract Preprocessors (NOT STARTED) - [ ] Create `src/nlp/preprocessors/` module - [ ] Extract text normalization, diacritic handling, chunk splitting - [ ] Hash old preprocessor code in `app.py` - [ ] Re-run v2 benchmark → must match baseline ## Phase E: Create Pipeline Orchestrator (NOT STARTED) - [ ] Create `src/nlp/pipeline.py` — orchestrates stages via PipelineContext - [ ] Wire `app.py` /api/analyze to use `pipeline.run(text)` - [ ] Hash old monolithic analyze code in `app.py` - [ ] Re-run v2 benchmark → must match baseline ## Phase F: Clean app.py (NOT STARTED) - [ ] Move helpers (get_word_positions, OffsetMapper, etc.) to utility modules - [ ] Remove all hashed (commented) code blocks - [ ] app.py should only contain: Flask routes + pipeline.run() calls - [ ] Final v2 benchmark → must match baseline - [ ] Target: app.py < 500 lines