# Project Setup, Stack & Deployment ## 1. Repository layout ``` doc-extraction-agent/ ├── CLAUDE.md # conventions & guardrails for the coding agent ├── README.md # quickstart + results table (eval evidence) ├── pyproject.toml # project + dependency declarations (managed by uv) ├── uv.lock # resolved, pinned dependency lock (committed) ├── .python-version # uv interpreter pin: 3.11 (committed) ├── .env.example # config template (no secrets committed) ├── docs/ │ ├── 01_requirements.md │ ├── 02_architecture.md │ ├── 03_data_and_extraction_spec.md │ └── 05_build_plan.md ├── src/doc_agent/ │ ├── __init__.py │ ├── config.py # loads env/config; selects backend │ ├── core.py # process_document(): the reusable pipeline │ ├── schema/ │ │ └── models.py # Pydantic Document, LineItem │ ├── parsing/ │ │ ├── detect.py # modality detection │ │ ├── docling_parser.py # native PDF → text/layout │ │ └── ocr.py # image → text (optional path) │ ├── backends/ │ │ ├── base.py # ExtractionBackend protocol + factory │ │ ├── gemini.py # free-tier multimodal adapter │ │ └── ollama.py # local model adapter │ ├── validation/ │ │ └── rules.py # hard/soft rules → report │ ├── routing/ │ │ └── score.py # confidence + decision (pure) │ ├── store/ │ │ ├── db.py # SQLite writer │ │ └── export.py # CSV export │ ├── ingest/ │ │ └── watcher.py # folder watcher / poll loop (batch entry) │ └── web/ │ └── app.py # Gradio demo (URL entry) ├── eval/ │ ├── run_eval.py # metrics over labelled datasets │ └── datasets/ # download scripts / loaders (no data in git) ├── data/ # gitignored: inbox/ processed/ review/ exports/ │ ├── inbox/ │ ├── processed/ │ ├── review/ │ └── exports/ └── tests/ ├── test_validation.py ├── test_routing.py ├── test_schema.py └── test_core_smoke.py ``` ## 2. Stack - **Runtime:** Python **3.11**, pinned via `.python-version` (`uv python pin 3.11`). Chosen over 3.12 for broadest wheel coverage across the Torch-based Docling stack and PaddleOCR/PaddlePaddle, which lags newest Pythons. Declared range: `requires-python = ">=3.11"`. - **Package manager:** `uv` (manages the venv, resolves and locks deps via `uv.lock`; add deps with `uv add`, run with `uv run`). - **Parsing:** `docling` (native PDF/scan structure). Optional OCR: `paddleocr` or `pytesseract` + system Tesseract. - **Modeling:** `google-genai` (Gemini free tier) and a local `ollama` server (e.g. `qwen2.5:7b` or a 3B variant) reached over HTTP. - **Contract/validation:** `pydantic` v2. - **Web demo:** `gradio`. - **Storage:** stdlib `sqlite3` + `csv`. - **Watcher:** `watchdog` (or a stdlib poll loop for max portability). - **Config:** `pydantic-settings` / `python-dotenv`. - **Testing:** `pytest`. Dependencies are declared in `pyproject.toml` and pinned via the committed `uv.lock` (`uv sync` installs exactly that lock). Do not float the model identifier in code — it is config (see guardrails). ## 3. Configuration (`.env.example`) ``` # Backend selection: "gemini" | "ollama" EXTRACTION_BACKEND=gemini # Gemini (free tier via Google AI Studio key; no card required) GEMINI_API_KEY= GEMINI_MODEL=gemini-flash-latest # identifier is config, not hardcoded # Ollama (local) OLLAMA_HOST=http://localhost:11434 OLLAMA_MODEL=qwen2.5:7b # Image handling: "vision_direct" | "ocr_then_text" IMAGE_STRATEGY=vision_direct # vision_direct requires a multimodal backend # Routing CONFIDENCE_THRESHOLD=0.85 # tuned via eval # Paths (batch mode) INBOX_DIR=./data/inbox PROCESSED_DIR=./data/processed REVIEW_DIR=./data/review EXPORT_DIR=./data/exports DB_PATH=./data/agent.db ``` `config.py` validates these at startup and fails fast with a clear message if, e.g., `gemini` is selected with no key, or `vision_direct` is selected with a text-only backend. ## 4. Local setup ```bash # 1. Pin the interpreter to 3.11 (writes .python-version; uv fetches it if absent) uv python pin 3.11 # 2. Install (uv creates the venv on 3.11 and installs from pyproject.toml + uv.lock) uv sync # 3a. Gemini path: get a free AI Studio key, put it in .env # (free tier, no credit card; quota resets daily) # 3b. Ollama path (offline/private): # install Ollama, then: ollama pull qwen2.5:7b # set EXTRACTION_BACKEND=ollama and IMAGE_STRATEGY=ocr_then_text # 4. Create working dirs mkdir -p data/{inbox,processed,review,exports} ``` ## 5. Running **Autonomous batch mode:** ```bash uv run python -m doc_agent.ingest.watcher # drop files into data/inbox/ — accepted records land in SQLite + data/exports/, # uncertain ones move to data/review/ ``` **Web demo (local):** ```bash uv run python -m doc_agent.web.app # opens a Gradio URL; upload one document to see fields + confidence + decision ``` **Evaluation:** ```bash uv run python eval/run_eval.py --dataset sroie --split holdout # prints per-field precision/recall/F1 and auto-accept precision on critical fields ``` ## 6. Deployment to Hugging Face Spaces (free public demo URL) 1. Create a new **Space** → SDK: **Gradio** (free CPU tier). Set the Space's Python to **3.11** (the `python_version: "3.11"` field in the Space README metadata) so the deployed runtime matches the pinned local interpreter. 2. Add `app.py` at the Space root that imports and launches `doc_agent.web.app` (or copy the web entry there), plus a `requirements.txt` the Gradio builder can read — generate it from the uv-managed project rather than hand-maintaining it: `uv export --no-hashes --no-dev -o requirements.txt`. 3. Set **Repository secrets** in the Space: `GEMINI_API_KEY`, `EXTRACTION_BACKEND=gemini`, `IMAGE_STRATEGY=vision_direct`, `GEMINI_MODEL=gemini-flash-latest`. 4. Push; the Space builds and serves a public URL. **Free-tier realities to design around (and to note in the UI):** - CPU-only and the Space **sleeps when idle** → first request after idle has a cold start. This is why the cloud demo uses the **Gemini API** for inference rather than a local model, and why `vision_direct` (no heavy OCR in the Space) is the demo's image path. - **Stateless:** no persistent DB in the demo. Show the result; don't store it. - **Privacy:** the free Gemini tier may use inputs for training, so the demo must display a "synthetic/public documents only" notice and must not be used for real financial data. ## 7. What stays free - **Inference:** local Ollama (no quota, private) or Gemini free tier (~1,500 req/day, resets daily, no card) — far above dev volume. - **Hosting:** Hugging Face Spaces free CPU tier for the public demo. - **Storage:** local SQLite/CSV; nothing paid. No component requires a credit card or paid plan for development or demo.