secureagentrag-api / README.md
LeomordKaly's picture
deploy: HF-flavored README with YAML frontmatter
1d88d45 verified
metadata
title: SecureAgentRAG API
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Privacy-first multi-agent RAG (BYOK demo)

SecureAgentRAG API

Production backend for the SecureAgentRAG public demo.

This Space hosts the FastAPI surface only. The Streamlit UI on main remains for local development; recruiters interact with the platform via the Next.js frontend deployed on Vercel.

Mode

Runs in BYOK (Bring Your Own Key) mode:

  • POST /byok/chat accepts visitor-supplied LLM credentials via headers
  • POST /byok/chat/stream is the SSE variant that surfaces phase / token / blocked / final events for the live trace UI
  • GET /byok/audit returns the visitor's last 50 PII-redacted audit entries so the frontend can display the SHA-256 chain
  • Owner-key fallback is throttled to 3 requests per IP per hour (Groq free tier protection) and consults X-Forwarded-For first so the throttle is not bypassed by HF's reverse proxy
  • Each visitor gets a session-scoped Qdrant collection that auto-purges every 24 hours
  • Phoenix instrumentation is hard-disabled (no third-party telemetry sees prompts or keys)
  • Every audit-log persist runs through utils.pii.redact with regression tests for the Groq / OpenAI / Anthropic / HF / Vercel / Qdrant JWT shapes
  • SAR_ALLOW_CLOUD_FOR_HIGH=true -- HIGH-sensitivity content is allowed to synthesize on the cloud LLM since this deploy has no local Ollama. The frontend renders a "sensitive: routed to cloud" badge on those answers.

Demo personas

Persona Clearance Roles Sees
engineer 2 (med) engineering public handbook, eng runbook, incident runbook, infra ADR, ML model card, NIST RMF
compliance 3 (high) compliance, legal public handbook, security policy, finance Q3, vendor MSA, ML model card, NIST, HR
executive 3 (high) executive, compliance, engineering union of the above

The RBAC filter is enforced at the Qdrant payload layer (org_id keyword + sensitivity_level_int range + roles match-any). Chunks the persona is not authorised to see are physically not returned, regardless of cosine-similarity score.

Endpoints

Path Purpose
GET /healthz Liveness probe (used by GitHub Actions keepalive cron)
GET /readyz Readiness -- pings Qdrant Cloud + Groq (Ollama skipped here)
POST /byok/chat Public-demo chat (BYOK or throttled owner-key)
POST /byok/chat/stream SSE variant -- emits phase / token / blocked / final events
GET /byok/audit Session-scoped audit export (PII redacted, SHA-256 chained)
POST /query Authenticated JWT endpoint (dev / staging compat)

Operator notes

  • 600+ tests passing on the source repo at the commit pinned in private/roadmap.md.
  • Built from Dockerfile.hf in the source tree -- this Space copy is renamed to Dockerfile so HF picks it up automatically.
  • CPU Basic hardware (2 vCPU, 16 GB RAM). Cold cross-encoder load adds ~5 s to the first request after wake; subsequent queries answer in <1 s end-to-end against the Vercel frontend.