Spaces:
Running
PROGRESS.md β Autonomous Build Ledger
This file is the memory for the overnight run. It outlives any single agent session. The loop reads it at the start of every iteration, does the next unchecked task in the TONIGHT section only, proves the task's acceptance check, commits, and ticks the box.
Protocol (read every iteration)
- Do the next unchecked task under TONIGHT, exactly one per iteration.
- Implement it per
docs/05_build_plan.mdand the other specs indocs/. - Run the task's Check command and paste the output (the loop's evaluator only sees what is printed).
- If it passes: commit just that task's changes with the listed Commit message; tick the box; commit the ledger update.
- If it fails after reasonable attempts: add a one-line note under BLOCKED, commit, and move to the next task. Do not halt the whole run.
- Never start a task under TOMORROW.
- Never add a dependency outside the night set in task N1.
- The specs already exist in
docs/β do not regenerate or edit them.
When every TONIGHT box is checked, run uv run pytest -q and
uv run ruff check .; if both are clean, print DONE_ALL and stop.
Night dependency set (the ONLY packages to add tonight)
- Runtime:
pydantic,pydantic-settings - Dev:
pytest,ruff
Everything else (docling, paddleocr/pytesseract, google-genai, ollama, gradio, watchdog) belongs to TOMORROW and must NOT be added tonight.
TONIGHT
N1 β Project scaffold (build plan 0.1) Run
uv init;uv python pin 3.11; setrequires-python = ">=3.11". Create thesrc/doc_agent/package tree andtests/from the setup-doc layout (thedocs/specs are already present β leave them). Add.gitignore(ignoredata/,.env,.venv/, caches; commituv.lockand.python-version) and an emptyREADME.md. Add night deps:uv add pydantic pydantic-settingsanduv add --dev pytest ruff. Check:uv sync && uv run python -c "import sys; assert sys.version_info[:2]==(3,11)" && cat .python-versionCommit:phase 0.1: project scaffold (uv, py3.11, package layout)N2 β Config loader (0.2)
src/doc_agent/config.pywith pydantic-settings: load env, validate combos (gemini requires key;vision_directrequires a multimodal backend), fail fast with clear messages. Addtests/test_config.py. Check:uv run pytest tests/test_config.py -qCommit:phase 0.2: config loader with validationN3 β Document schema (1.1)
src/doc_agent/schema/models.py:DocumentandLineItemperdocs/03_data_and_extraction_spec.md, with moneyβfloat and dateβISO normalizers. Addtests/test_schema.py. Check:uv run pytest tests/test_schema.py -qCommit:phase 1.1: pydantic document schema + normalizersN4 β Validation rules (1.2)
src/doc_agent/validation/rules.py: hard rules H1βH4, soft rules S1βS4, monetary epsilon, returning a structured report. Pure functions, no I/O. Addtests/test_validation.py(reconciling totals pass H2/H3; mismatches fail; soft failures recorded without forcing review). Check:uv run pytest tests/test_validation.py -qCommit:phase 1.2: validation rules (hard/soft + arithmetic checks)N5 β Confidence & routing (1.3)
src/doc_agent/routing/score.py: purescore(data, report, model_signal)androute(score, report)with hard-failure short-circuit. Addtests/test_routing.py(hard fail β review regardless of score; threshold boundary; missing required fields lower score). Check:uv run pytest tests/test_routing.py -qCommit:phase 1.3: confidence scoring + routing decisionN6 β Modality detection (2.1)
src/doc_agent/parsing/detect.py: map a file tonative_pdf | imageby extension/MIME. Addtests/test_detect.py. Check:uv run pytest tests/test_detect.py -qCommit:phase 2.1: modality detectionN7 β Backend interface + offline stub (2.4 + stub)
src/doc_agent/backends/base.py: theExtractionBackendprotocol,BackendResult, and a factory built from config.src/doc_agent/backends/stub.py: aStubBackendreturning deterministic, schema-validDocumentdata with no network (this is the stub the smoke test and CLAUDE.md reference). Do NOT implementgemini.pyorollama.pytonight. Addtests/test_backends.py(factory returns configured backend; unknown backend β clear error; stub returns schema-valid data). Check:uv run pytest tests/test_backends.py -qCommit:phase 2.4: backend interface + factory + offline stub backendN8 β Core pipeline orchestration (3.1, stub-backed)
src/doc_agent/core.py:process_document(path) -> ExtractionResultchaining detect β acquire βbackend.extractβ validate β score β route. Pure of side-effects (no file moves, no DB). For tonight, define theacquireinterface and a minimal injectable path so the smoke test can supply a fake payload β real Docling/OCRacquireis a TOMORROW task. Define theExtractionResulttype here. Addtests/test_core_smoke.pyrunning end-to-end withStubBackend+ an injected payload (no network), asserting a populated result with a decision. Check:uv run pytest tests/test_core_smoke.py -qCommit:phase 3.1: core pipeline orchestration (stub-backed smoke test)N9 β Idempotency helper (3.2) A content-hash helper so the same file is not reprocessed across runs. Add
tests/test_hash.py(same file β same hash). Check:uv run pytest tests/test_hash.py -qCommit:phase 3.2: content-hash idempotency helperN-FINAL β Full green gate Run the whole suite and linter; fix anything red until clean. Check:
uv run pytest -q && uv run ruff check .Commit:chore: full suite + lint green(only if any fixes were needed)
β STOP β END OF AUTONOMOUS SCOPE
Do not proceed past this line. Everything below requires API keys, a local model server, large/finicky native dependencies, or live datasets β all for the morning, supervised.
TOMORROW (DO NOT START β supervised, needs setup)
- Add deferred deps:
docling, (paddleocrorpytesseract),google-genai,ollama,gradio,watchdog. - 2.2 Docling parser (downloads layout models on first run).
- 2.3 OCR path (PaddleOCR/Tesseract β watch the Paddle/3.11 wheel risk; fall back to pytesseract if it won't resolve).
- Wire the real Docling/OCR
acquireintocore.py. - 2.5 Gemini backend β needs
GEMINI_API_KEY. - 2.6 Ollama backend β needs a local Ollama server + pulled model.
- 4.1 persistence (SQLite + CSV); 4.2 watcher; 4.3 Gradio web demo.
- 5 evaluation harness β needs datasets + a real backend; counts against the Gemini free-tier daily limit, so run deliberately.
- 6 deploy to Hugging Face Spaces.
BLOCKED
(none yet β the loop appends one-line entries here)
RUN LOG
(the loop may append short per-iteration notes here)