# PROGRESS.md — Autonomous Build Ledger

This file is the memory for the overnight run. It outlives any single agent
session. The loop reads it at the start of every iteration, does the **next
unchecked task in the TONIGHT section only**, proves the task's acceptance
check, commits, and ticks the box.

## Protocol (read every iteration)

1. Do the next unchecked task under **TONIGHT**, exactly one per iteration.
2. Implement it per `docs/05_build_plan.md` and the other specs in `docs/`.
3. Run the task's **Check** command and paste the output (the loop's evaluator
   only sees what is printed).
4. If it passes: commit just that task's changes with the listed **Commit**
   message; tick the box; commit the ledger update.
5. If it fails after reasonable attempts: add a one-line note under **BLOCKED**,
   commit, and move to the next task. Do not halt the whole run.
6. **Never** start a task under **TOMORROW**.
7. **Never** add a dependency outside the night set in task **N1**.
8. The specs already exist in `docs/` — do not regenerate or edit them.

When every TONIGHT box is checked, run `uv run pytest -q` and
`uv run ruff check .`; if both are clean, print `DONE_ALL` and stop.

---

## Night dependency set (the ONLY packages to add tonight)

- Runtime: `pydantic`, `pydantic-settings`
- Dev: `pytest`, `ruff`

Everything else (docling, paddleocr/pytesseract, google-genai, ollama, gradio,
watchdog) belongs to TOMORROW and must NOT be added tonight.

---

## TONIGHT

- [x] **N1 — Project scaffold** (build plan 0.1)
  Run `uv init`; `uv python pin 3.11`; set `requires-python = ">=3.11"`. Create
  the `src/doc_agent/` package tree and `tests/` from the setup-doc layout (the
  `docs/` specs are already present — leave them). Add `.gitignore` (ignore
  `data/`, `.env`, `.venv/`, caches; **commit** `uv.lock` and `.python-version`)
  and an empty `README.md`. Add night deps: `uv add pydantic pydantic-settings`
  and `uv add --dev pytest ruff`.
  Check: `uv sync && uv run python -c "import sys; assert sys.version_info[:2]==(3,11)" && cat .python-version`
  Commit: `phase 0.1: project scaffold (uv, py3.11, package layout)`

- [x] **N2 — Config loader** (0.2)
  `src/doc_agent/config.py` with pydantic-settings: load env, validate combos
  (gemini requires key; `vision_direct` requires a multimodal backend), fail
  fast with clear messages. Add `tests/test_config.py`.
  Check: `uv run pytest tests/test_config.py -q`
  Commit: `phase 0.2: config loader with validation`

- [x] **N3 — Document schema** (1.1)
  `src/doc_agent/schema/models.py`: `Document` and `LineItem` per
  `docs/03_data_and_extraction_spec.md`, with money→float and date→ISO
  normalizers. Add `tests/test_schema.py`.
  Check: `uv run pytest tests/test_schema.py -q`
  Commit: `phase 1.1: pydantic document schema + normalizers`

- [x] **N4 — Validation rules** (1.2)
  `src/doc_agent/validation/rules.py`: hard rules H1–H4, soft rules S1–S4,
  monetary epsilon, returning a structured report. Pure functions, no I/O. Add
  `tests/test_validation.py` (reconciling totals pass H2/H3; mismatches fail;
  soft failures recorded without forcing review).
  Check: `uv run pytest tests/test_validation.py -q`
  Commit: `phase 1.2: validation rules (hard/soft + arithmetic checks)`

- [x] **N5 — Confidence & routing** (1.3)
  `src/doc_agent/routing/score.py`: pure `score(data, report, model_signal)` and
  `route(score, report)` with hard-failure short-circuit. Add
  `tests/test_routing.py` (hard fail ⇒ review regardless of score; threshold
  boundary; missing required fields lower score).
  Check: `uv run pytest tests/test_routing.py -q`
  Commit: `phase 1.3: confidence scoring + routing decision`

- [x] **N6 — Modality detection** (2.1)
  `src/doc_agent/parsing/detect.py`: map a file to `native_pdf | image` by
  extension/MIME. Add `tests/test_detect.py`.
  Check: `uv run pytest tests/test_detect.py -q`
  Commit: `phase 2.1: modality detection`

- [x] **N7 — Backend interface + offline stub** (2.4 + stub)
  `src/doc_agent/backends/base.py`: the `ExtractionBackend` protocol,
  `BackendResult`, and a factory built from config.
  `src/doc_agent/backends/stub.py`: a `StubBackend` returning deterministic,
  schema-valid `Document` data with no network (this is the stub the smoke test
  and CLAUDE.md reference). **Do NOT implement `gemini.py` or `ollama.py`
  tonight.** Add `tests/test_backends.py` (factory returns configured backend;
  unknown backend ⇒ clear error; stub returns schema-valid data).
  Check: `uv run pytest tests/test_backends.py -q`
  Commit: `phase 2.4: backend interface + factory + offline stub backend`

- [x] **N8 — Core pipeline orchestration** (3.1, stub-backed)
  `src/doc_agent/core.py`: `process_document(path) -> ExtractionResult` chaining
  detect → acquire → `backend.extract` → validate → score → route. Pure of
  side-effects (no file moves, no DB). For tonight, define the `acquire`
  interface and a minimal injectable path so the smoke test can supply a fake
  payload — **real Docling/OCR `acquire` is a TOMORROW task**. Define the
  `ExtractionResult` type here. Add `tests/test_core_smoke.py` running
  end-to-end with `StubBackend` + an injected payload (no network), asserting a
  populated result with a decision.
  Check: `uv run pytest tests/test_core_smoke.py -q`
  Commit: `phase 3.1: core pipeline orchestration (stub-backed smoke test)`

- [x] **N9 — Idempotency helper** (3.2)
  A content-hash helper so the same file is not reprocessed across runs. Add
  `tests/test_hash.py` (same file → same hash).
  Check: `uv run pytest tests/test_hash.py -q`
  Commit: `phase 3.2: content-hash idempotency helper`

- [x] **N-FINAL — Full green gate**
  Run the whole suite and linter; fix anything red until clean.
  Check: `uv run pytest -q && uv run ruff check .`
  Commit: `chore: full suite + lint green` (only if any fixes were needed)

---

## ⛔ STOP — END OF AUTONOMOUS SCOPE

Do not proceed past this line. Everything below requires API keys, a local
model server, large/finicky native dependencies, or live datasets — all for the
morning, supervised.

---

## TOMORROW (DO NOT START — supervised, needs setup)

- [ ] Add deferred deps: `docling`, (`paddleocr` or `pytesseract`),
  `google-genai`, `ollama`, `gradio`, `watchdog`.
- [ ] 2.2 Docling parser (downloads layout models on first run).
- [ ] 2.3 OCR path (PaddleOCR/Tesseract — watch the Paddle/3.11 wheel risk;
  fall back to pytesseract if it won't resolve).
- [ ] Wire the real Docling/OCR `acquire` into `core.py`.
- [ ] 2.5 Gemini backend — **needs `GEMINI_API_KEY`**.
- [ ] 2.6 Ollama backend — **needs a local Ollama server + pulled model**.
- [ ] 4.1 persistence (SQLite + CSV); 4.2 watcher; 4.3 Gradio web demo.
- [ ] 5 evaluation harness — needs datasets + a real backend; **counts against
  the Gemini free-tier daily limit**, so run deliberately.
- [ ] 6 deploy to Hugging Face Spaces.

---

## BLOCKED

(none yet — the loop appends one-line entries here)

## RUN LOG

(the loop may append short per-iteration notes here)