document-extract-agent / PROGRESS.md
kennethzychew's picture
tick N-FINAL: full suite + lint green
28cf3cf
|
Raw
History Blame Contribute Delete
7.12 kB
# PROGRESS.md β€” Autonomous Build Ledger
This file is the memory for the overnight run. It outlives any single agent
session. The loop reads it at the start of every iteration, does the **next
unchecked task in the TONIGHT section only**, proves the task's acceptance
check, commits, and ticks the box.
## Protocol (read every iteration)
1. Do the next unchecked task under **TONIGHT**, exactly one per iteration.
2. Implement it per `docs/05_build_plan.md` and the other specs in `docs/`.
3. Run the task's **Check** command and paste the output (the loop's evaluator
only sees what is printed).
4. If it passes: commit just that task's changes with the listed **Commit**
message; tick the box; commit the ledger update.
5. If it fails after reasonable attempts: add a one-line note under **BLOCKED**,
commit, and move to the next task. Do not halt the whole run.
6. **Never** start a task under **TOMORROW**.
7. **Never** add a dependency outside the night set in task **N1**.
8. The specs already exist in `docs/` β€” do not regenerate or edit them.
When every TONIGHT box is checked, run `uv run pytest -q` and
`uv run ruff check .`; if both are clean, print `DONE_ALL` and stop.
---
## Night dependency set (the ONLY packages to add tonight)
- Runtime: `pydantic`, `pydantic-settings`
- Dev: `pytest`, `ruff`
Everything else (docling, paddleocr/pytesseract, google-genai, ollama, gradio,
watchdog) belongs to TOMORROW and must NOT be added tonight.
---
## TONIGHT
- [x] **N1 β€” Project scaffold** (build plan 0.1)
Run `uv init`; `uv python pin 3.11`; set `requires-python = ">=3.11"`. Create
the `src/doc_agent/` package tree and `tests/` from the setup-doc layout (the
`docs/` specs are already present β€” leave them). Add `.gitignore` (ignore
`data/`, `.env`, `.venv/`, caches; **commit** `uv.lock` and `.python-version`)
and an empty `README.md`. Add night deps: `uv add pydantic pydantic-settings`
and `uv add --dev pytest ruff`.
Check: `uv sync && uv run python -c "import sys; assert sys.version_info[:2]==(3,11)" && cat .python-version`
Commit: `phase 0.1: project scaffold (uv, py3.11, package layout)`
- [x] **N2 β€” Config loader** (0.2)
`src/doc_agent/config.py` with pydantic-settings: load env, validate combos
(gemini requires key; `vision_direct` requires a multimodal backend), fail
fast with clear messages. Add `tests/test_config.py`.
Check: `uv run pytest tests/test_config.py -q`
Commit: `phase 0.2: config loader with validation`
- [x] **N3 β€” Document schema** (1.1)
`src/doc_agent/schema/models.py`: `Document` and `LineItem` per
`docs/03_data_and_extraction_spec.md`, with money→float and date→ISO
normalizers. Add `tests/test_schema.py`.
Check: `uv run pytest tests/test_schema.py -q`
Commit: `phase 1.1: pydantic document schema + normalizers`
- [x] **N4 β€” Validation rules** (1.2)
`src/doc_agent/validation/rules.py`: hard rules H1–H4, soft rules S1–S4,
monetary epsilon, returning a structured report. Pure functions, no I/O. Add
`tests/test_validation.py` (reconciling totals pass H2/H3; mismatches fail;
soft failures recorded without forcing review).
Check: `uv run pytest tests/test_validation.py -q`
Commit: `phase 1.2: validation rules (hard/soft + arithmetic checks)`
- [x] **N5 β€” Confidence & routing** (1.3)
`src/doc_agent/routing/score.py`: pure `score(data, report, model_signal)` and
`route(score, report)` with hard-failure short-circuit. Add
`tests/test_routing.py` (hard fail β‡’ review regardless of score; threshold
boundary; missing required fields lower score).
Check: `uv run pytest tests/test_routing.py -q`
Commit: `phase 1.3: confidence scoring + routing decision`
- [x] **N6 β€” Modality detection** (2.1)
`src/doc_agent/parsing/detect.py`: map a file to `native_pdf | image` by
extension/MIME. Add `tests/test_detect.py`.
Check: `uv run pytest tests/test_detect.py -q`
Commit: `phase 2.1: modality detection`
- [x] **N7 β€” Backend interface + offline stub** (2.4 + stub)
`src/doc_agent/backends/base.py`: the `ExtractionBackend` protocol,
`BackendResult`, and a factory built from config.
`src/doc_agent/backends/stub.py`: a `StubBackend` returning deterministic,
schema-valid `Document` data with no network (this is the stub the smoke test
and CLAUDE.md reference). **Do NOT implement `gemini.py` or `ollama.py`
tonight.** Add `tests/test_backends.py` (factory returns configured backend;
unknown backend β‡’ clear error; stub returns schema-valid data).
Check: `uv run pytest tests/test_backends.py -q`
Commit: `phase 2.4: backend interface + factory + offline stub backend`
- [x] **N8 β€” Core pipeline orchestration** (3.1, stub-backed)
`src/doc_agent/core.py`: `process_document(path) -> ExtractionResult` chaining
detect β†’ acquire β†’ `backend.extract` β†’ validate β†’ score β†’ route. Pure of
side-effects (no file moves, no DB). For tonight, define the `acquire`
interface and a minimal injectable path so the smoke test can supply a fake
payload β€” **real Docling/OCR `acquire` is a TOMORROW task**. Define the
`ExtractionResult` type here. Add `tests/test_core_smoke.py` running
end-to-end with `StubBackend` + an injected payload (no network), asserting a
populated result with a decision.
Check: `uv run pytest tests/test_core_smoke.py -q`
Commit: `phase 3.1: core pipeline orchestration (stub-backed smoke test)`
- [x] **N9 β€” Idempotency helper** (3.2)
A content-hash helper so the same file is not reprocessed across runs. Add
`tests/test_hash.py` (same file β†’ same hash).
Check: `uv run pytest tests/test_hash.py -q`
Commit: `phase 3.2: content-hash idempotency helper`
- [x] **N-FINAL β€” Full green gate**
Run the whole suite and linter; fix anything red until clean.
Check: `uv run pytest -q && uv run ruff check .`
Commit: `chore: full suite + lint green` (only if any fixes were needed)
---
## β›” STOP β€” END OF AUTONOMOUS SCOPE
Do not proceed past this line. Everything below requires API keys, a local
model server, large/finicky native dependencies, or live datasets β€” all for the
morning, supervised.
---
## TOMORROW (DO NOT START β€” supervised, needs setup)
- [ ] Add deferred deps: `docling`, (`paddleocr` or `pytesseract`),
`google-genai`, `ollama`, `gradio`, `watchdog`.
- [ ] 2.2 Docling parser (downloads layout models on first run).
- [ ] 2.3 OCR path (PaddleOCR/Tesseract β€” watch the Paddle/3.11 wheel risk;
fall back to pytesseract if it won't resolve).
- [ ] Wire the real Docling/OCR `acquire` into `core.py`.
- [ ] 2.5 Gemini backend β€” **needs `GEMINI_API_KEY`**.
- [ ] 2.6 Ollama backend β€” **needs a local Ollama server + pulled model**.
- [ ] 4.1 persistence (SQLite + CSV); 4.2 watcher; 4.3 Gradio web demo.
- [ ] 5 evaluation harness β€” needs datasets + a real backend; **counts against
the Gemini free-tier daily limit**, so run deliberately.
- [ ] 6 deploy to Hugging Face Spaces.
---
## BLOCKED
(none yet β€” the loop appends one-line entries here)
## RUN LOG
(the loop may append short per-iteration notes here)