Spaces:
Running
Running
metadata
title: SecureAgentRAG API
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Privacy-first multi-agent RAG (BYOK demo)
SecureAgentRAG API
Production backend for the SecureAgentRAG public demo.
- Frontend: https://secureagentrag-web.vercel.app
- Source: https://github.com/moazmo/secureagentrag (branch
deploy/prod-launch) - License: MIT
This Space hosts the FastAPI surface only. The Streamlit UI on main
remains for local development; recruiters interact with the platform via
the Next.js frontend deployed on Vercel.
Mode
Runs in BYOK (Bring Your Own Key) mode:
POST /byok/chataccepts visitor-supplied LLM credentials via headersPOST /byok/chat/streamis the SSE variant that surfaces phase / token / blocked / final events for the live trace UIGET /byok/auditreturns the visitor's last 50 PII-redacted audit entries so the frontend can display the SHA-256 chain- Owner-key fallback is throttled to 3 requests per IP per hour (Groq free
tier protection) and consults
X-Forwarded-Forfirst so the throttle is not bypassed by HF's reverse proxy - Each visitor gets a session-scoped Qdrant collection that auto-purges every 24 hours
- Phoenix instrumentation is hard-disabled (no third-party telemetry sees prompts or keys)
- Every audit-log persist runs through
utils.pii.redactwith regression tests for the Groq / OpenAI / Anthropic / HF / Vercel / Qdrant JWT shapes SAR_ALLOW_CLOUD_FOR_HIGH=true-- HIGH-sensitivity content is allowed to synthesize on the cloud LLM since this deploy has no local Ollama. The frontend renders a "sensitive: routed to cloud" badge on those answers.
Demo personas
| Persona | Clearance | Roles | Sees |
|---|---|---|---|
| engineer | 2 (med) | engineering | public handbook, eng runbook, incident runbook, infra ADR, ML model card, NIST RMF |
| compliance | 3 (high) | compliance, legal | public handbook, security policy, finance Q3, vendor MSA, ML model card, NIST, HR |
| executive | 3 (high) | executive, compliance, engineering | union of the above |
The RBAC filter is enforced at the Qdrant payload layer (org_id keyword +
sensitivity_level_int range + roles match-any). Chunks the persona is
not authorised to see are physically not returned, regardless of
cosine-similarity score.
Endpoints
| Path | Purpose |
|---|---|
GET /healthz |
Liveness probe (used by GitHub Actions keepalive cron) |
GET /readyz |
Readiness -- pings Qdrant Cloud + Groq (Ollama skipped here) |
POST /byok/chat |
Public-demo chat (BYOK or throttled owner-key) |
POST /byok/chat/stream |
SSE variant -- emits phase / token / blocked / final events |
GET /byok/audit |
Session-scoped audit export (PII redacted, SHA-256 chained) |
POST /query |
Authenticated JWT endpoint (dev / staging compat) |
Operator notes
- 600+ tests passing on the source repo at the commit pinned in
private/roadmap.md. - Built from
Dockerfile.hfin the source tree -- this Space copy is renamed toDockerfileso HF picks it up automatically. - CPU Basic hardware (2 vCPU, 16 GB RAM). Cold cross-encoder load adds ~5 s to the first request after wake; subsequent queries answer in <1 s end-to-end against the Vercel frontend.