A newer version of the Streamlit SDK is available: 1.57.0
title: Document-Audit RAG
emoji: π
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
DocuAudit AI
DocuAudit AI is a production-oriented FastAPI backend plus optional Streamlit UI for multi-document RAG: upload documents, build a Chroma vector index, ask grounded questions with citations, and retain a SQLite audit trail of every query.
Architecture
flowchart LR
subgraph ingest [Ingestion]
A[PDF / TXT / MD] --> B[Loader]
B --> C[Chunker]
C --> D[Embedder]
D --> E[(ChromaDB)]
end
subgraph query [Query path]
Q[User question] --> R[Semantic search]
R --> E
R --> T[Top-K chunks]
T --> L[LLM]
L --> U[Answer + citations]
end
U --> V[(SQLite audit)]
ASCII equivalent:
PDF Upload β Parser β Chunker β Embedder β ChromaDB
β
User Query β Semantic Search β Top-K Chunks β LLM β Answer + Citations
β
Audit Log (SQLite)
Use cases
- Litigation document analysis β trace claims to exact pages and filenames.
- Corporate finance review β compare disclosures and filings under a consistent audit log.
- Investigation support β bulk ingest, async jobs, and reproducible query history.
Deploying on Hugging Face Spaces
- Set
LLM_PROVIDER=huggingface; useHUGGINGFACE_API_KEYand/or the Space secretHF_TOKEN(see.env.example). - Use root
app.pyas the Streamlit entry for the default Hub command. - Hub UI, secrets, hardware, and Streamlit SDK details: Streamlit Spaces, Spaces overview.
- Test locally before deploy:
uv run python scripts/verify_huggingface_inference.py(requiresLLM_PROVIDER=huggingfacein.env).
Quick start with Docker
Requires Docker Engine and Compose v2. The snippet below matches the shipped docker-compose.yml: API on 8000, Streamlit on 8501, with Chroma and SQLite under /data inside the API container. After docker compose up -d, expect curl http://localhost:8000/health to return JSON including "status":"ok".
git clone <repository-url> doc-Audi-ai
cd doc-Audi-ai
cp .env.example .env
# edit .env as needed; for compose Ollama: OLLAMA_BASE_URL=http://ollama:11434
# (with host Ollama: run `ollama serve`; compose defaults to host.docker.internal:11434)
docker compose build
docker compose up -d
curl -s http://localhost:8000/health
# http://localhost:8501 β Streamlit
docker compose down
Optional all-in-one Ollama in Compose: docker compose --profile ollama up -d (then set OLLAMA_BASE_URL=http://ollama:11434 in .env and recreate containers).
How it works (user workflow)
Collections, ingestion vs querying, jobs vs audit, Streamlit tabs, and per-button UI flows: docs/USER_WORKFLOW.md.
Run and test (step-by-step)
For ingestion formats, URL rules, job polling, sample sample.txt walkthrough, curl/PowerShell examples, and troubleshooting, see docs/RUN_AND_TEST_GUIDE.md.
For SQLite vs Memcached, offline DB inspection, and the Cursor SQLite Viewer extension (qwtel.sqlite-viewer), see docs/SQLITE_AND_DB_INSPECTION.md.
Quick start (local, without Docker)
Run the API with uv (or your preferred tool):
git clone <repository-url> doc-Audi-ai
cd doc-Audi-ai
cp .env.example .env
uv sync
ollama pull llama3.1:8b
ollama pull nomic-embed-text
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload --reload-dir api --reload-dir storage
Optional UI:
uv run streamlit run streamlit_app.py --server.port 8501 --server.address 0.0.0.0
API overview
| Method | Path | Description |
|---|---|---|
| GET | /health |
Liveness; returns configured app name and version |
| POST | /ingest/upload |
Multipart files (one or more); queues background ingest job |
| POST | /ingest/url |
JSON urls array (1β100); download and queue ingest |
| GET | /ingest/collections |
Lists collections with document_count and optional created_at |
| DELETE | /ingest/collection/{collection_name} |
Drops a collection; returns documents_removed |
| GET | /jobs |
Lists jobs with total count |
| GET | /jobs/{job_id} |
Job status with progress_percent, file counters, timestamps, errors |
| POST | /query/ask |
Grounded answer; request includes top_k, user_id |
| POST | /query/summarise |
Collection summary; distinct response shape (summary, document_count, β¦) |
| POST | /query |
Legacy alias of /query/ask |
| GET | /audit/logs |
Filterable audit index (user_id, from_date, to_date, pagination) |
| GET | /audit/logs/{query_id} |
Full stored answer and citations for one query |
Interactive docs: http://localhost:8000/docs.
Sample request and response (POST /query/ask)
Request:
{
"question": "What were the key risk factors identified in the Q3 2023 financial report?",
"collection_name": "default",
"top_k": 5,
"user_id": "analyst_001"
}
Response (shape; values depend on your documents and model):
{
"query_id": "uuid-string",
"question": "What were the key risk factors identified in the Q3 2023 financial report?",
"answer": "β¦ grounded text with citations β¦",
"sources": [
{
"document_name": "q3_financial_report.pdf",
"page_number": 12,
"chunk_text": "Key risk factors include β¦",
"relevance_score": 0.91
}
],
"model_used": "llama3.1:8b",
"tokens_used": 0,
"response_time_ms": 1820,
"timestamp": "2026-05-03T12:00:00Z"
}
Design decisions
- Source citations β High-stakes review requires every substantive claim to be tied to document name and page (where available), not a free-floating model monologue.
- Auditability β Each ask/summarise persists query id, user id, timing, model id, token usage (when the provider exposes it), and serialized sources so regulators or counsel can reconstruct what the system returned.
Scale note
Architecture is designed for high-volume document ingestion via async background jobs (FastAPI BackgroundTasks), persistent Chroma collections, and a stateless API tier that can be replicated once you add a shared vector store and job queue.
Tests
Automated API tests use pytest with isolated temp databases; they do not require a running server or Ollama.
uv sync
uv run pytest tests/ -q
Full guide (commands, coverage by file, mocks vs manual smoke tests, troubleshooting): docs/TESTING.md.
Configuration
See .env.example. Common variables include LLM_PROVIDER, Ollama/OpenAI/Anthropic keys and models, CHROMA_PERSIST_DIRECTORY, AUDIT_DB_PATH, JOBS_DB_PATH, and upload limits (MAX_FILE_SIZE_MB; MAX_UPLOAD_SIZE_MB is accepted as an alias via settings normalization).