# Contributing ## Repo layout ``` src/guichetoi/ Library (importable as `guichetoi.…`) inference.py LayoutLMv3 classifier + extractor pipeline recommendation.py Rule engine that verdicts demande complétude cms.py Pre-fill CMS IMMO 9 BANBOU xlsx from a Verdict api/main.py FastAPI service wrapping all of the above scripts/ Training pipeline + batch utilities (CLIs) apps/ UI applications (Streamlit demo) tools/ One-off dev / debug scripts tests/ Pytest suite assets/ Templates, logos, non-data static files data/ label_mappings.json (other data dirs are gitignored) docs/ Internal markdown docs .github/ CI workflow + PR/issue templates ``` ## Local setup ```bash python -m venv .venv source .venv/bin/activate # or: .venv\Scripts\activate on Windows pip install -e ".[dev,ui]" pip install -r requirements.txt # exact pins (optional; for reproducibility) ``` External requirement: **Tesseract OCR with the French language pack** must be on `PATH` for inference to work. ## Branch strategy GitHub Flow: - `main` is always deployable. Protected: requires PR + green CI to merge. - One topic branch per work unit. Naming: - `feature/` — new capability - `fix/` — bug fix - `chore/` — refactor, infra, deps - `docs/` — docs only - Squash-merge into `main`. Delete the branch after merge. ## Workflow ```bash git checkout main && git pull git checkout -b feature/my-thing # … edits … pytest -q # local sanity check ruff check src/ tests/ # lint mypy --config-file mypy.ini src/guichetoi/cms.py src/guichetoi/recommendation.py git push -u origin feature/my-thing gh pr create # or open via github.com ``` CI runs lint + tests on every PR. Both must be green to merge. ## What never goes in git - Customer documents (`DataSet1/`, `DataSet2/`, `DataRef/`) - Real extracted PII (`assets/sample_verdicts.json`) - Model weights (`models/`) - Label Studio raw exports (`project-*-at-*.json`) `.gitignore` enforces these — when in doubt, check `git status` before committing and never use `git add -f` to override an ignore rule.