Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Spaces:

bayan10
/

bayan-api

Running

App Files Files Community

Fetching metadata from the HF Docker repository...

bayan-api / tasks /todo.md

youssefreda9's picture

HF Deploy: Fix syntax error with smart quotes in popup.js

fe1e225 about 23 hours ago

|

History Blame Contribute Delete

2.41 kB

BAYAN v2.0 — Task List

Phase A: Test Infrastructure ✅

Create tests/v2/test_level1_raw.py — Raw model tests with TP/FP/FN/TN verdicts
Create tests/v2/test_level2_solo.py — Solo API endpoint tests
Create tests/v2/test_level3_integrated.py — Full pipeline tests
Create tests/v2/benchmark_matrix.py — Master comparison runner
Fix verdict logic (strip terminal punctuation before comparison)
Run baseline on entities + spelling datasets
Run full 320-test baseline across all 3 levels

Phase A.1: Project Cleanup ✅

Archive legacy scripts (AraSpell.py, Grammer_Rules.py, PuncAra.py)
Archive 36 old phase/verification reports
Archive 23 old test files + 8 phase10 helpers
Delete 35 orphaned debug/temp files
Fix .gitignore corruption (binary null bytes)
Fix PROJECT_DESCRIPTION.md stale reference
Archive docs/audit + docs/audits

Phase B: Extract Stages (NOT STARTED)

Create src/nlp/stages/spelling_stage.py
Create src/nlp/stages/grammar_stage.py
Create src/nlp/stages/punctuation_stage.py
Each stage wraps: model call → filter → verdict
Hash (comment out) old inline stage code in app.py
Re-run v2 benchmark → must match Phase A baseline

Phase C: Extract Filters (NOT STARTED)

Create src/nlp/filters/ module
Extract overlap resolution, religious guard, entity guard
Hash old filter code in app.py
Re-run v2 benchmark → must match baseline

Phase D: Extract Preprocessors (NOT STARTED)

Create src/nlp/preprocessors/ module
Extract text normalization, diacritic handling, chunk splitting
Hash old preprocessor code in app.py
Re-run v2 benchmark → must match baseline

Phase E: Create Pipeline Orchestrator (NOT STARTED)

Create src/nlp/pipeline.py — orchestrates stages via PipelineContext
Wire app.py /api/analyze to use pipeline.run(text)
Hash old monolithic analyze code in app.py
Re-run v2 benchmark → must match baseline

Phase F: Clean app.py (NOT STARTED)

Move helpers (get_word_positions, OffsetMapper, etc.) to utility modules
Remove all hashed (commented) code blocks
app.py should only contain: Flask routes + pipeline.run() calls
Final v2 benchmark → must match baseline
Target: app.py < 500 lines