Spaces:
Running
Running
Project Setup, Stack & Deployment
1. Repository layout
doc-extraction-agent/
βββ CLAUDE.md # conventions & guardrails for the coding agent
βββ README.md # quickstart + results table (eval evidence)
βββ pyproject.toml # project + dependency declarations (managed by uv)
βββ uv.lock # resolved, pinned dependency lock (committed)
βββ .python-version # uv interpreter pin: 3.11 (committed)
βββ .env.example # config template (no secrets committed)
βββ docs/
β βββ 01_requirements.md
β βββ 02_architecture.md
β βββ 03_data_and_extraction_spec.md
β βββ 05_build_plan.md
βββ src/doc_agent/
β βββ __init__.py
β βββ config.py # loads env/config; selects backend
β βββ core.py # process_document(): the reusable pipeline
β βββ schema/
β β βββ models.py # Pydantic Document, LineItem
β βββ parsing/
β β βββ detect.py # modality detection
β β βββ docling_parser.py # native PDF β text/layout
β β βββ ocr.py # image β text (optional path)
β βββ backends/
β β βββ base.py # ExtractionBackend protocol + factory
β β βββ gemini.py # free-tier multimodal adapter
β β βββ ollama.py # local model adapter
β βββ validation/
β β βββ rules.py # hard/soft rules β report
β βββ routing/
β β βββ score.py # confidence + decision (pure)
β βββ store/
β β βββ db.py # SQLite writer
β β βββ export.py # CSV export
β βββ ingest/
β β βββ watcher.py # folder watcher / poll loop (batch entry)
β βββ web/
β βββ app.py # Gradio demo (URL entry)
βββ eval/
β βββ run_eval.py # metrics over labelled datasets
β βββ datasets/ # download scripts / loaders (no data in git)
βββ data/ # gitignored: inbox/ processed/ review/ exports/
β βββ inbox/
β βββ processed/
β βββ review/
β βββ exports/
βββ tests/
βββ test_validation.py
βββ test_routing.py
βββ test_schema.py
βββ test_core_smoke.py
2. Stack
- Runtime: Python 3.11, pinned via
.python-version(uv python pin 3.11). Chosen over 3.12 for broadest wheel coverage across the Torch-based Docling stack and PaddleOCR/PaddlePaddle, which lags newest Pythons. Declared range:requires-python = ">=3.11". - Package manager:
uv(manages the venv, resolves and locks deps viauv.lock; add deps withuv add, run withuv run). - Parsing:
docling(native PDF/scan structure). Optional OCR:paddleocrorpytesseract+ system Tesseract. - Modeling:
google-genai(Gemini free tier) and a localollamaserver (e.g.qwen2.5:7bor a 3B variant) reached over HTTP. - Contract/validation:
pydanticv2. - Web demo:
gradio. - Storage: stdlib
sqlite3+csv. - Watcher:
watchdog(or a stdlib poll loop for max portability). - Config:
pydantic-settings/python-dotenv. - Testing:
pytest.
Dependencies are declared in pyproject.toml and pinned via the committed
uv.lock (uv sync installs exactly that lock). Do not float the model
identifier in code β it is config (see guardrails).
3. Configuration (.env.example)
# Backend selection: "gemini" | "ollama"
EXTRACTION_BACKEND=gemini
# Gemini (free tier via Google AI Studio key; no card required)
GEMINI_API_KEY=
GEMINI_MODEL=gemini-flash-latest # identifier is config, not hardcoded
# Ollama (local)
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen2.5:7b
# Image handling: "vision_direct" | "ocr_then_text"
IMAGE_STRATEGY=vision_direct # vision_direct requires a multimodal backend
# Routing
CONFIDENCE_THRESHOLD=0.85 # tuned via eval
# Paths (batch mode)
INBOX_DIR=./data/inbox
PROCESSED_DIR=./data/processed
REVIEW_DIR=./data/review
EXPORT_DIR=./data/exports
DB_PATH=./data/agent.db
config.py validates these at startup and fails fast with a clear message if,
e.g., gemini is selected with no key, or vision_direct is selected with a
text-only backend.
4. Local setup
# 1. Pin the interpreter to 3.11 (writes .python-version; uv fetches it if absent)
uv python pin 3.11
# 2. Install (uv creates the venv on 3.11 and installs from pyproject.toml + uv.lock)
uv sync
# 3a. Gemini path: get a free AI Studio key, put it in .env
# (free tier, no credit card; quota resets daily)
# 3b. Ollama path (offline/private):
# install Ollama, then:
ollama pull qwen2.5:7b
# set EXTRACTION_BACKEND=ollama and IMAGE_STRATEGY=ocr_then_text
# 4. Create working dirs
mkdir -p data/{inbox,processed,review,exports}
5. Running
Autonomous batch mode:
uv run python -m doc_agent.ingest.watcher
# drop files into data/inbox/ β accepted records land in SQLite + data/exports/,
# uncertain ones move to data/review/
Web demo (local):
uv run python -m doc_agent.web.app
# opens a Gradio URL; upload one document to see fields + confidence + decision
Evaluation:
uv run python eval/run_eval.py --dataset sroie --split holdout
# prints per-field precision/recall/F1 and auto-accept precision on critical fields
6. Deployment to Hugging Face Spaces (free public demo URL)
- Create a new Space β SDK: Gradio (free CPU tier). Set the Space's
Python to 3.11 (the
python_version: "3.11"field in the Space README metadata) so the deployed runtime matches the pinned local interpreter. - Add
app.pyat the Space root that imports and launchesdoc_agent.web.app(or copy the web entry there), plus arequirements.txtthe Gradio builder can read β generate it from the uv-managed project rather than hand-maintaining it:uv export --no-hashes --no-dev -o requirements.txt. - Set Repository secrets in the Space:
GEMINI_API_KEY,EXTRACTION_BACKEND=gemini,IMAGE_STRATEGY=vision_direct,GEMINI_MODEL=gemini-flash-latest. - Push; the Space builds and serves a public URL.
Free-tier realities to design around (and to note in the UI):
- CPU-only and the Space sleeps when idle β first request after idle has a
cold start. This is why the cloud demo uses the Gemini API for inference
rather than a local model, and why
vision_direct(no heavy OCR in the Space) is the demo's image path. - Stateless: no persistent DB in the demo. Show the result; don't store it.
- Privacy: the free Gemini tier may use inputs for training, so the demo must display a "synthetic/public documents only" notice and must not be used for real financial data.
7. What stays free
- Inference: local Ollama (no quota, private) or Gemini free tier (~1,500 req/day, resets daily, no card) β far above dev volume.
- Hosting: Hugging Face Spaces free CPU tier for the public demo.
- Storage: local SQLite/CSV; nothing paid.
No component requires a credit card or paid plan for development or demo.