--- title: Document-Audit RAG emoji: 📑 colorFrom: blue colorTo: indigo sdk: streamlit sdk_version: "1.39.0" app_file: app.py --- # DocuAudit AI **DocuAudit AI** is a production-oriented FastAPI backend plus optional Streamlit UI for **multi-document RAG**: upload documents, build a Chroma vector index, ask grounded questions with citations, and retain a **SQLite audit trail** of every query. ## Architecture ```mermaid flowchart LR subgraph ingest [Ingestion] A[PDF / TXT / MD] --> B[Loader] B --> C[Chunker] C --> D[Embedder] D --> E[(ChromaDB)] end subgraph query [Query path] Q[User question] --> R[Semantic search] R --> E R --> T[Top-K chunks] T --> L[LLM] L --> U[Answer + citations] end U --> V[(SQLite audit)] ``` ASCII equivalent: ``` PDF Upload → Parser → Chunker → Embedder → ChromaDB ↓ User Query → Semantic Search → Top-K Chunks → LLM → Answer + Citations ↓ Audit Log (SQLite) ``` ## Use cases - **Litigation document analysis** — trace claims to exact pages and filenames. - **Corporate finance review** — compare disclosures and filings under a consistent audit log. - **Investigation support** — bulk ingest, async jobs, and reproducible query history. ## Deploying on Hugging Face Spaces - Set **`LLM_PROVIDER=huggingface`**; use **`HUGGINGFACE_API_KEY`** and/or the Space secret **`HF_TOKEN`** (see [`.env.example`](.env.example)). - Use root **`app.py`** as the Streamlit entry for the default Hub command. - Hub UI, secrets, hardware, and Streamlit SDK details: [Streamlit Spaces](https://huggingface.co/docs/hub/spaces-sdks-streamlit), [Spaces overview](https://huggingface.co/docs/hub/spaces-overview). - **Test locally before deploy:** `uv run python scripts/verify_huggingface_inference.py` (requires `LLM_PROVIDER=huggingface` in `.env`). ## Quick start with Docker Requires [Docker Engine](https://docs.docker.com/engine/) and Compose v2. The snippet below matches the shipped **`docker-compose.yml`**: API on **8000**, Streamlit on **8501**, with Chroma and SQLite under **`/data`** inside the API container. After **`docker compose up -d`**, expect **`curl http://localhost:8000/health`** to return JSON including **`"status":"ok"`**. ```bash git clone doc-Audi-ai cd doc-Audi-ai cp .env.example .env # edit .env as needed; for compose Ollama: OLLAMA_BASE_URL=http://ollama:11434 # (with host Ollama: run `ollama serve`; compose defaults to host.docker.internal:11434) docker compose build docker compose up -d curl -s http://localhost:8000/health # http://localhost:8501 — Streamlit docker compose down ``` Optional all-in-one Ollama in Compose: `docker compose --profile ollama up -d` (then set `OLLAMA_BASE_URL=http://ollama:11434` in `.env` and recreate containers). ## How it works (user workflow) Collections, ingestion vs querying, jobs vs audit, Streamlit tabs, and **per-button UI flows**: **[docs/USER_WORKFLOW.md](docs/USER_WORKFLOW.md)**. ## Run and test (step-by-step) For ingestion formats, URL rules, job polling, sample `sample.txt` walkthrough, curl/PowerShell examples, and troubleshooting, see **[docs/RUN_AND_TEST_GUIDE.md](docs/RUN_AND_TEST_GUIDE.md)**. For SQLite vs Memcached, offline DB inspection, and the Cursor **SQLite Viewer** extension (`qwtel.sqlite-viewer`), see **[docs/SQLITE_AND_DB_INSPECTION.md](docs/SQLITE_AND_DB_INSPECTION.md)**. ## Quick start (local, without Docker) Run the API with **uv** (or your preferred tool): ```bash git clone doc-Audi-ai cd doc-Audi-ai cp .env.example .env uv sync ollama pull llama3.1:8b ollama pull nomic-embed-text uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload --reload-dir api --reload-dir storage ``` Optional UI: ```bash uv run streamlit run streamlit_app.py --server.port 8501 --server.address 0.0.0.0 ``` ## API overview | Method | Path | Description | |--------|------|-------------| | GET | `/health` | Liveness; returns configured app name and version | | POST | `/ingest/upload` | Multipart **`files`** (one or more); queues background ingest job | | POST | `/ingest/url` | JSON **`urls`** array (1–100); download and queue ingest | | GET | `/ingest/collections` | Lists collections with **`document_count`** and optional **`created_at`** | | DELETE | `/ingest/collection/{collection_name}` | Drops a collection; returns **`documents_removed`** | | GET | `/jobs` | Lists jobs with **`total`** count | | GET | `/jobs/{job_id}` | Job status with **`progress_percent`**, file counters, timestamps, **`errors`** | | POST | `/query/ask` | Grounded answer; request includes **`top_k`**, **`user_id`** | | POST | `/query/summarise` | Collection summary; distinct response shape (`summary`, `document_count`, …) | | POST | `/query` | Legacy alias of **`/query/ask`** | | GET | `/audit/logs` | Filterable audit index (`user_id`, `from_date`, `to_date`, pagination) | | GET | `/audit/logs/{query_id}` | Full stored answer and citations for one query | Interactive docs: `http://localhost:8000/docs`. ## Sample request and response (`POST /query/ask`) Request: ```json { "question": "What were the key risk factors identified in the Q3 2023 financial report?", "collection_name": "default", "top_k": 5, "user_id": "analyst_001" } ``` Response (shape; values depend on your documents and model): ```json { "query_id": "uuid-string", "question": "What were the key risk factors identified in the Q3 2023 financial report?", "answer": "… grounded text with citations …", "sources": [ { "document_name": "q3_financial_report.pdf", "page_number": 12, "chunk_text": "Key risk factors include …", "relevance_score": 0.91 } ], "model_used": "llama3.1:8b", "tokens_used": 0, "response_time_ms": 1820, "timestamp": "2026-05-03T12:00:00Z" } ``` ## Design decisions - **Source citations** — High-stakes review requires every substantive claim to be tied to **document name** and **page** (where available), not a free-floating model monologue. - **Auditability** — Each ask/summarise persists **query id**, **user id**, timing, model id, token usage (when the provider exposes it), and serialized sources so regulators or counsel can reconstruct what the system returned. ## Scale note Architecture is designed for **high-volume document ingestion** via **async background jobs** (FastAPI `BackgroundTasks`), persistent Chroma collections, and a stateless API tier that can be replicated once you add a shared vector store and job queue. ## Tests Automated API tests use **pytest** with isolated temp databases; they do **not** require a running server or Ollama. ```bash uv sync uv run pytest tests/ -q ``` Full guide (commands, coverage by file, mocks vs manual smoke tests, troubleshooting): **[docs/TESTING.md](docs/TESTING.md)**. ## Configuration See **`.env.example`**. Common variables include `LLM_PROVIDER`, Ollama/OpenAI/Anthropic keys and models, `CHROMA_PERSIST_DIRECTORY`, `AUDIT_DB_PATH`, `JOBS_DB_PATH`, and upload limits (`MAX_FILE_SIZE_MB`; **`MAX_UPLOAD_SIZE_MB`** is accepted as an alias via settings normalization).