Document-Audit-RAG / README.md
Mayank Chugh
Deploy DocuAudit AI to Hugging Face Space (no binaries)
d44b33d

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade
metadata
title: Document-Audit RAG
emoji: πŸ“‘
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py

DocuAudit AI

DocuAudit AI is a production-oriented FastAPI backend plus optional Streamlit UI for multi-document RAG: upload documents, build a Chroma vector index, ask grounded questions with citations, and retain a SQLite audit trail of every query.

Architecture

flowchart LR
  subgraph ingest [Ingestion]
    A[PDF / TXT / MD] --> B[Loader]
    B --> C[Chunker]
    C --> D[Embedder]
    D --> E[(ChromaDB)]
  end
  subgraph query [Query path]
    Q[User question] --> R[Semantic search]
    R --> E
    R --> T[Top-K chunks]
    T --> L[LLM]
    L --> U[Answer + citations]
  end
  U --> V[(SQLite audit)]

ASCII equivalent:

PDF Upload β†’ Parser β†’ Chunker β†’ Embedder β†’ ChromaDB
                                              ↓
User Query β†’ Semantic Search β†’ Top-K Chunks β†’ LLM β†’ Answer + Citations
                                              ↓
                                       Audit Log (SQLite)

Use cases

  • Litigation document analysis β€” trace claims to exact pages and filenames.
  • Corporate finance review β€” compare disclosures and filings under a consistent audit log.
  • Investigation support β€” bulk ingest, async jobs, and reproducible query history.

Deploying on Hugging Face Spaces

  • Set LLM_PROVIDER=huggingface; use HUGGINGFACE_API_KEY and/or the Space secret HF_TOKEN (see .env.example).
  • Use root app.py as the Streamlit entry for the default Hub command.
  • Hub UI, secrets, hardware, and Streamlit SDK details: Streamlit Spaces, Spaces overview.
  • Test locally before deploy: uv run python scripts/verify_huggingface_inference.py (requires LLM_PROVIDER=huggingface in .env).

Quick start with Docker

Requires Docker Engine and Compose v2. The snippet below matches the shipped docker-compose.yml: API on 8000, Streamlit on 8501, with Chroma and SQLite under /data inside the API container. After docker compose up -d, expect curl http://localhost:8000/health to return JSON including "status":"ok".

git clone <repository-url> doc-Audi-ai
cd doc-Audi-ai
cp .env.example .env
# edit .env as needed; for compose Ollama: OLLAMA_BASE_URL=http://ollama:11434
# (with host Ollama: run `ollama serve`; compose defaults to host.docker.internal:11434)

docker compose build
docker compose up -d
curl -s http://localhost:8000/health
# http://localhost:8501 β€” Streamlit
docker compose down

Optional all-in-one Ollama in Compose: docker compose --profile ollama up -d (then set OLLAMA_BASE_URL=http://ollama:11434 in .env and recreate containers).

How it works (user workflow)

Collections, ingestion vs querying, jobs vs audit, Streamlit tabs, and per-button UI flows: docs/USER_WORKFLOW.md.

Run and test (step-by-step)

For ingestion formats, URL rules, job polling, sample sample.txt walkthrough, curl/PowerShell examples, and troubleshooting, see docs/RUN_AND_TEST_GUIDE.md.

For SQLite vs Memcached, offline DB inspection, and the Cursor SQLite Viewer extension (qwtel.sqlite-viewer), see docs/SQLITE_AND_DB_INSPECTION.md.

Quick start (local, without Docker)

Run the API with uv (or your preferred tool):

git clone <repository-url> doc-Audi-ai
cd doc-Audi-ai
cp .env.example .env
uv sync
ollama pull llama3.1:8b
ollama pull nomic-embed-text
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

uv run uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload --reload-dir api --reload-dir storage

Optional UI:

uv run streamlit run streamlit_app.py --server.port 8501 --server.address 0.0.0.0

API overview

Method Path Description
GET /health Liveness; returns configured app name and version
POST /ingest/upload Multipart files (one or more); queues background ingest job
POST /ingest/url JSON urls array (1–100); download and queue ingest
GET /ingest/collections Lists collections with document_count and optional created_at
DELETE /ingest/collection/{collection_name} Drops a collection; returns documents_removed
GET /jobs Lists jobs with total count
GET /jobs/{job_id} Job status with progress_percent, file counters, timestamps, errors
POST /query/ask Grounded answer; request includes top_k, user_id
POST /query/summarise Collection summary; distinct response shape (summary, document_count, …)
POST /query Legacy alias of /query/ask
GET /audit/logs Filterable audit index (user_id, from_date, to_date, pagination)
GET /audit/logs/{query_id} Full stored answer and citations for one query

Interactive docs: http://localhost:8000/docs.

Sample request and response (POST /query/ask)

Request:

{
  "question": "What were the key risk factors identified in the Q3 2023 financial report?",
  "collection_name": "default",
  "top_k": 5,
  "user_id": "analyst_001"
}

Response (shape; values depend on your documents and model):

{
  "query_id": "uuid-string",
  "question": "What were the key risk factors identified in the Q3 2023 financial report?",
  "answer": "… grounded text with citations …",
  "sources": [
    {
      "document_name": "q3_financial_report.pdf",
      "page_number": 12,
      "chunk_text": "Key risk factors include …",
      "relevance_score": 0.91
    }
  ],
  "model_used": "llama3.1:8b",
  "tokens_used": 0,
  "response_time_ms": 1820,
  "timestamp": "2026-05-03T12:00:00Z"
}

Design decisions

  • Source citations β€” High-stakes review requires every substantive claim to be tied to document name and page (where available), not a free-floating model monologue.
  • Auditability β€” Each ask/summarise persists query id, user id, timing, model id, token usage (when the provider exposes it), and serialized sources so regulators or counsel can reconstruct what the system returned.

Scale note

Architecture is designed for high-volume document ingestion via async background jobs (FastAPI BackgroundTasks), persistent Chroma collections, and a stateless API tier that can be replicated once you add a shared vector store and job queue.

Tests

Automated API tests use pytest with isolated temp databases; they do not require a running server or Ollama.

uv sync
uv run pytest tests/ -q

Full guide (commands, coverage by file, mocks vs manual smoke tests, troubleshooting): docs/TESTING.md.

Configuration

See .env.example. Common variables include LLM_PROVIDER, Ollama/OpenAI/Anthropic keys and models, CHROMA_PERSIST_DIRECTORY, AUDIT_DB_PATH, JOBS_DB_PATH, and upload limits (MAX_FILE_SIZE_MB; MAX_UPLOAD_SIZE_MB is accepted as an alias via settings normalization).