--- title: SecureAgentRAG API emoji: 🛡️ colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false short_description: Privacy-first multi-agent RAG (BYOK demo) --- # SecureAgentRAG API Production backend for the [SecureAgentRAG](https://github.com/moazmo/secureagentrag) public demo. - **Frontend:** https://secureagentrag-web.vercel.app - **Source:** https://github.com/moazmo/secureagentrag (branch `deploy/prod-launch`) - **License:** MIT This Space hosts the FastAPI surface only. The Streamlit UI on `main` remains for local development; recruiters interact with the platform via the Next.js frontend deployed on Vercel. ## Mode Runs in BYOK (Bring Your Own Key) mode: - `POST /byok/chat` accepts visitor-supplied LLM credentials via headers - `POST /byok/chat/stream` is the SSE variant that surfaces phase / token / blocked / final events for the live trace UI - `GET /byok/audit` returns the visitor's last 50 PII-redacted audit entries so the frontend can display the SHA-256 chain - Owner-key fallback is throttled to 3 requests per IP per hour (Groq free tier protection) and consults `X-Forwarded-For` first so the throttle is not bypassed by HF's reverse proxy - Each visitor gets a session-scoped Qdrant collection that auto-purges every 24 hours - Phoenix instrumentation is hard-disabled (no third-party telemetry sees prompts or keys) - Every audit-log persist runs through `utils.pii.redact` with regression tests for the Groq / OpenAI / Anthropic / HF / Vercel / Qdrant JWT shapes - `SAR_ALLOW_CLOUD_FOR_HIGH=true` -- HIGH-sensitivity content is allowed to synthesize on the cloud LLM since this deploy has no local Ollama. The frontend renders a "sensitive: routed to cloud" badge on those answers. ## Demo personas | Persona | Clearance | Roles | Sees | |-------------|-----------|------------------------------------|---------------------------------------------------------------------------------------| | engineer | 2 (med) | engineering | public handbook, eng runbook, incident runbook, infra ADR, ML model card, NIST RMF | | compliance | 3 (high) | compliance, legal | public handbook, security policy, finance Q3, vendor MSA, ML model card, NIST, HR | | executive | 3 (high) | executive, compliance, engineering | union of the above | The RBAC filter is enforced at the Qdrant payload layer (`org_id` keyword + `sensitivity_level_int` range + `roles` match-any). Chunks the persona is not authorised to see are physically not returned, regardless of cosine-similarity score. ## Endpoints | Path | Purpose | |-------------------------|---------------------------------------------------------------| | `GET /healthz` | Liveness probe (used by GitHub Actions keepalive cron) | | `GET /readyz` | Readiness -- pings Qdrant Cloud + Groq (Ollama skipped here) | | `POST /byok/chat` | Public-demo chat (BYOK or throttled owner-key) | | `POST /byok/chat/stream`| SSE variant -- emits phase / token / blocked / final events | | `GET /byok/audit` | Session-scoped audit export (PII redacted, SHA-256 chained) | | `POST /query` | Authenticated JWT endpoint (dev / staging compat) | ## Operator notes - 600+ tests passing on the source repo at the commit pinned in `private/roadmap.md`. - Built from `Dockerfile.hf` in the source tree -- this Space copy is renamed to `Dockerfile` so HF picks it up automatically. - CPU Basic hardware (2 vCPU, 16 GB RAM). Cold cross-encoder load adds ~5 s to the first request after wake; subsequent queries answer in <1 s end-to-end against the Vercel frontend.