Spaces:
Running
Running
| # Project Setup, Stack & Deployment | |
| ## 1. Repository layout | |
| ``` | |
| doc-extraction-agent/ | |
| βββ CLAUDE.md # conventions & guardrails for the coding agent | |
| βββ README.md # quickstart + results table (eval evidence) | |
| βββ pyproject.toml # project + dependency declarations (managed by uv) | |
| βββ uv.lock # resolved, pinned dependency lock (committed) | |
| βββ .python-version # uv interpreter pin: 3.11 (committed) | |
| βββ .env.example # config template (no secrets committed) | |
| βββ docs/ | |
| β βββ 01_requirements.md | |
| β βββ 02_architecture.md | |
| β βββ 03_data_and_extraction_spec.md | |
| β βββ 05_build_plan.md | |
| βββ src/doc_agent/ | |
| β βββ __init__.py | |
| β βββ config.py # loads env/config; selects backend | |
| β βββ core.py # process_document(): the reusable pipeline | |
| β βββ schema/ | |
| β β βββ models.py # Pydantic Document, LineItem | |
| β βββ parsing/ | |
| β β βββ detect.py # modality detection | |
| β β βββ docling_parser.py # native PDF β text/layout | |
| β β βββ ocr.py # image β text (optional path) | |
| β βββ backends/ | |
| β β βββ base.py # ExtractionBackend protocol + factory | |
| β β βββ gemini.py # free-tier multimodal adapter | |
| β β βββ ollama.py # local model adapter | |
| β βββ validation/ | |
| β β βββ rules.py # hard/soft rules β report | |
| β βββ routing/ | |
| β β βββ score.py # confidence + decision (pure) | |
| β βββ store/ | |
| β β βββ db.py # SQLite writer | |
| β β βββ export.py # CSV export | |
| β βββ ingest/ | |
| β β βββ watcher.py # folder watcher / poll loop (batch entry) | |
| β βββ web/ | |
| β βββ app.py # Gradio demo (URL entry) | |
| βββ eval/ | |
| β βββ run_eval.py # metrics over labelled datasets | |
| β βββ datasets/ # download scripts / loaders (no data in git) | |
| βββ data/ # gitignored: inbox/ processed/ review/ exports/ | |
| β βββ inbox/ | |
| β βββ processed/ | |
| β βββ review/ | |
| β βββ exports/ | |
| βββ tests/ | |
| βββ test_validation.py | |
| βββ test_routing.py | |
| βββ test_schema.py | |
| βββ test_core_smoke.py | |
| ``` | |
| ## 2. Stack | |
| - **Runtime:** Python **3.11**, pinned via `.python-version` (`uv python pin | |
| 3.11`). Chosen over 3.12 for broadest wheel coverage across the Torch-based | |
| Docling stack and PaddleOCR/PaddlePaddle, which lags newest Pythons. | |
| Declared range: `requires-python = ">=3.11"`. | |
| - **Package manager:** `uv` (manages the venv, resolves and locks deps via | |
| `uv.lock`; add deps with `uv add`, run with `uv run`). | |
| - **Parsing:** `docling` (native PDF/scan structure). Optional OCR: | |
| `paddleocr` or `pytesseract` + system Tesseract. | |
| - **Modeling:** `google-genai` (Gemini free tier) and a local `ollama` server | |
| (e.g. `qwen2.5:7b` or a 3B variant) reached over HTTP. | |
| - **Contract/validation:** `pydantic` v2. | |
| - **Web demo:** `gradio`. | |
| - **Storage:** stdlib `sqlite3` + `csv`. | |
| - **Watcher:** `watchdog` (or a stdlib poll loop for max portability). | |
| - **Config:** `pydantic-settings` / `python-dotenv`. | |
| - **Testing:** `pytest`. | |
| Dependencies are declared in `pyproject.toml` and pinned via the committed | |
| `uv.lock` (`uv sync` installs exactly that lock). Do not float the model | |
| identifier in code β it is config (see guardrails). | |
| ## 3. Configuration (`.env.example`) | |
| ``` | |
| # Backend selection: "gemini" | "ollama" | |
| EXTRACTION_BACKEND=gemini | |
| # Gemini (free tier via Google AI Studio key; no card required) | |
| GEMINI_API_KEY= | |
| GEMINI_MODEL=gemini-flash-latest # identifier is config, not hardcoded | |
| # Ollama (local) | |
| OLLAMA_HOST=http://localhost:11434 | |
| OLLAMA_MODEL=qwen2.5:7b | |
| # Image handling: "vision_direct" | "ocr_then_text" | |
| IMAGE_STRATEGY=vision_direct # vision_direct requires a multimodal backend | |
| # Routing | |
| CONFIDENCE_THRESHOLD=0.85 # tuned via eval | |
| # Paths (batch mode) | |
| INBOX_DIR=./data/inbox | |
| PROCESSED_DIR=./data/processed | |
| REVIEW_DIR=./data/review | |
| EXPORT_DIR=./data/exports | |
| DB_PATH=./data/agent.db | |
| ``` | |
| `config.py` validates these at startup and fails fast with a clear message if, | |
| e.g., `gemini` is selected with no key, or `vision_direct` is selected with a | |
| text-only backend. | |
| ## 4. Local setup | |
| ```bash | |
| # 1. Pin the interpreter to 3.11 (writes .python-version; uv fetches it if absent) | |
| uv python pin 3.11 | |
| # 2. Install (uv creates the venv on 3.11 and installs from pyproject.toml + uv.lock) | |
| uv sync | |
| # 3a. Gemini path: get a free AI Studio key, put it in .env | |
| # (free tier, no credit card; quota resets daily) | |
| # 3b. Ollama path (offline/private): | |
| # install Ollama, then: | |
| ollama pull qwen2.5:7b | |
| # set EXTRACTION_BACKEND=ollama and IMAGE_STRATEGY=ocr_then_text | |
| # 4. Create working dirs | |
| mkdir -p data/{inbox,processed,review,exports} | |
| ``` | |
| ## 5. Running | |
| **Autonomous batch mode:** | |
| ```bash | |
| uv run python -m doc_agent.ingest.watcher | |
| # drop files into data/inbox/ β accepted records land in SQLite + data/exports/, | |
| # uncertain ones move to data/review/ | |
| ``` | |
| **Web demo (local):** | |
| ```bash | |
| uv run python -m doc_agent.web.app | |
| # opens a Gradio URL; upload one document to see fields + confidence + decision | |
| ``` | |
| **Evaluation:** | |
| ```bash | |
| uv run python eval/run_eval.py --dataset sroie --split holdout | |
| # prints per-field precision/recall/F1 and auto-accept precision on critical fields | |
| ``` | |
| ## 6. Deployment to Hugging Face Spaces (free public demo URL) | |
| 1. Create a new **Space** β SDK: **Gradio** (free CPU tier). Set the Space's | |
| Python to **3.11** (the `python_version: "3.11"` field in the Space README | |
| metadata) so the deployed runtime matches the pinned local interpreter. | |
| 2. Add `app.py` at the Space root that imports and launches | |
| `doc_agent.web.app` (or copy the web entry there), plus a `requirements.txt` | |
| the Gradio builder can read β generate it from the uv-managed project rather | |
| than hand-maintaining it: `uv export --no-hashes --no-dev -o requirements.txt`. | |
| 3. Set **Repository secrets** in the Space: `GEMINI_API_KEY`, | |
| `EXTRACTION_BACKEND=gemini`, `IMAGE_STRATEGY=vision_direct`, | |
| `GEMINI_MODEL=gemini-flash-latest`. | |
| 4. Push; the Space builds and serves a public URL. | |
| **Free-tier realities to design around (and to note in the UI):** | |
| - CPU-only and the Space **sleeps when idle** β first request after idle has a | |
| cold start. This is why the cloud demo uses the **Gemini API** for inference | |
| rather than a local model, and why `vision_direct` (no heavy OCR in the | |
| Space) is the demo's image path. | |
| - **Stateless:** no persistent DB in the demo. Show the result; don't store it. | |
| - **Privacy:** the free Gemini tier may use inputs for training, so the demo | |
| must display a "synthetic/public documents only" notice and must not be used | |
| for real financial data. | |
| ## 7. What stays free | |
| - **Inference:** local Ollama (no quota, private) or Gemini free tier | |
| (~1,500 req/day, resets daily, no card) β far above dev volume. | |
| - **Hosting:** Hugging Face Spaces free CPU tier for the public demo. | |
| - **Storage:** local SQLite/CSV; nothing paid. | |
| No component requires a credit card or paid plan for development or demo. | |