Document-Audit-RAG / README.md
Mayank Chugh
Deploy DocuAudit AI to Hugging Face Space (no binaries)
d44b33d
---
title: Document-Audit RAG
emoji: πŸ“‘
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: "1.39.0"
app_file: app.py
---
# DocuAudit AI
**DocuAudit AI** is a production-oriented FastAPI backend plus optional Streamlit UI for **multi-document RAG**: upload documents, build a Chroma vector index, ask grounded questions with citations, and retain a **SQLite audit trail** of every query.
## Architecture
```mermaid
flowchart LR
subgraph ingest [Ingestion]
A[PDF / TXT / MD] --> B[Loader]
B --> C[Chunker]
C --> D[Embedder]
D --> E[(ChromaDB)]
end
subgraph query [Query path]
Q[User question] --> R[Semantic search]
R --> E
R --> T[Top-K chunks]
T --> L[LLM]
L --> U[Answer + citations]
end
U --> V[(SQLite audit)]
```
ASCII equivalent:
```
PDF Upload β†’ Parser β†’ Chunker β†’ Embedder β†’ ChromaDB
↓
User Query β†’ Semantic Search β†’ Top-K Chunks β†’ LLM β†’ Answer + Citations
↓
Audit Log (SQLite)
```
## Use cases
- **Litigation document analysis** β€” trace claims to exact pages and filenames.
- **Corporate finance review** β€” compare disclosures and filings under a consistent audit log.
- **Investigation support** β€” bulk ingest, async jobs, and reproducible query history.
## Deploying on Hugging Face Spaces
- Set **`LLM_PROVIDER=huggingface`**; use **`HUGGINGFACE_API_KEY`** and/or the Space secret **`HF_TOKEN`** (see [`.env.example`](.env.example)).
- Use root **`app.py`** as the Streamlit entry for the default Hub command.
- Hub UI, secrets, hardware, and Streamlit SDK details: [Streamlit Spaces](https://huggingface.co/docs/hub/spaces-sdks-streamlit), [Spaces overview](https://huggingface.co/docs/hub/spaces-overview).
- **Test locally before deploy:** `uv run python scripts/verify_huggingface_inference.py` (requires `LLM_PROVIDER=huggingface` in `.env`).
## Quick start with Docker
Requires [Docker Engine](https://docs.docker.com/engine/) and Compose v2. The snippet below matches the shipped **`docker-compose.yml`**: API on **8000**, Streamlit on **8501**, with Chroma and SQLite under **`/data`** inside the API container. After **`docker compose up -d`**, expect **`curl http://localhost:8000/health`** to return JSON including **`"status":"ok"`**.
```bash
git clone <repository-url> doc-Audi-ai
cd doc-Audi-ai
cp .env.example .env
# edit .env as needed; for compose Ollama: OLLAMA_BASE_URL=http://ollama:11434
# (with host Ollama: run `ollama serve`; compose defaults to host.docker.internal:11434)
docker compose build
docker compose up -d
curl -s http://localhost:8000/health
# http://localhost:8501 β€” Streamlit
docker compose down
```
Optional all-in-one Ollama in Compose: `docker compose --profile ollama up -d` (then set `OLLAMA_BASE_URL=http://ollama:11434` in `.env` and recreate containers).
## How it works (user workflow)
Collections, ingestion vs querying, jobs vs audit, Streamlit tabs, and **per-button UI flows**: **[docs/USER_WORKFLOW.md](docs/USER_WORKFLOW.md)**.
## Run and test (step-by-step)
For ingestion formats, URL rules, job polling, sample `sample.txt` walkthrough, curl/PowerShell examples, and troubleshooting, see **[docs/RUN_AND_TEST_GUIDE.md](docs/RUN_AND_TEST_GUIDE.md)**.
For SQLite vs Memcached, offline DB inspection, and the Cursor **SQLite Viewer** extension (`qwtel.sqlite-viewer`), see **[docs/SQLITE_AND_DB_INSPECTION.md](docs/SQLITE_AND_DB_INSPECTION.md)**.
## Quick start (local, without Docker)
Run the API with **uv** (or your preferred tool):
```bash
git clone <repository-url> doc-Audi-ai
cd doc-Audi-ai
cp .env.example .env
uv sync
ollama pull llama3.1:8b
ollama pull nomic-embed-text
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload --reload-dir api --reload-dir storage
```
Optional UI:
```bash
uv run streamlit run streamlit_app.py --server.port 8501 --server.address 0.0.0.0
```
## API overview
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Liveness; returns configured app name and version |
| POST | `/ingest/upload` | Multipart **`files`** (one or more); queues background ingest job |
| POST | `/ingest/url` | JSON **`urls`** array (1–100); download and queue ingest |
| GET | `/ingest/collections` | Lists collections with **`document_count`** and optional **`created_at`** |
| DELETE | `/ingest/collection/{collection_name}` | Drops a collection; returns **`documents_removed`** |
| GET | `/jobs` | Lists jobs with **`total`** count |
| GET | `/jobs/{job_id}` | Job status with **`progress_percent`**, file counters, timestamps, **`errors`** |
| POST | `/query/ask` | Grounded answer; request includes **`top_k`**, **`user_id`** |
| POST | `/query/summarise` | Collection summary; distinct response shape (`summary`, `document_count`, …) |
| POST | `/query` | Legacy alias of **`/query/ask`** |
| GET | `/audit/logs` | Filterable audit index (`user_id`, `from_date`, `to_date`, pagination) |
| GET | `/audit/logs/{query_id}` | Full stored answer and citations for one query |
Interactive docs: `http://localhost:8000/docs`.
## Sample request and response (`POST /query/ask`)
Request:
```json
{
"question": "What were the key risk factors identified in the Q3 2023 financial report?",
"collection_name": "default",
"top_k": 5,
"user_id": "analyst_001"
}
```
Response (shape; values depend on your documents and model):
```json
{
"query_id": "uuid-string",
"question": "What were the key risk factors identified in the Q3 2023 financial report?",
"answer": "… grounded text with citations …",
"sources": [
{
"document_name": "q3_financial_report.pdf",
"page_number": 12,
"chunk_text": "Key risk factors include …",
"relevance_score": 0.91
}
],
"model_used": "llama3.1:8b",
"tokens_used": 0,
"response_time_ms": 1820,
"timestamp": "2026-05-03T12:00:00Z"
}
```
## Design decisions
- **Source citations** β€” High-stakes review requires every substantive claim to be tied to **document name** and **page** (where available), not a free-floating model monologue.
- **Auditability** β€” Each ask/summarise persists **query id**, **user id**, timing, model id, token usage (when the provider exposes it), and serialized sources so regulators or counsel can reconstruct what the system returned.
## Scale note
Architecture is designed for **high-volume document ingestion** via **async background jobs** (FastAPI `BackgroundTasks`), persistent Chroma collections, and a stateless API tier that can be replicated once you add a shared vector store and job queue.
## Tests
Automated API tests use **pytest** with isolated temp databases; they do **not** require a running server or Ollama.
```bash
uv sync
uv run pytest tests/ -q
```
Full guide (commands, coverage by file, mocks vs manual smoke tests, troubleshooting): **[docs/TESTING.md](docs/TESTING.md)**.
## Configuration
See **`.env.example`**. Common variables include `LLM_PROVIDER`, Ollama/OpenAI/Anthropic keys and models, `CHROMA_PERSIST_DIRECTORY`, `AUDIT_DB_PATH`, `JOBS_DB_PATH`, and upload limits (`MAX_FILE_SIZE_MB`; **`MAX_UPLOAD_SIZE_MB`** is accepted as an alias via settings normalization).