Spaces:

LeomordKaly
/

secureagentrag-api

Running

App Files Files Community

secureagentrag-api / README.md

LeomordKaly

deploy: HF-flavored README with YAML frontmatter

1d88d45 verified 14 days ago

preview code

raw

history blame contribute delete

4.04 kB

metadata

title: SecureAgentRAG API
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Privacy-first multi-agent RAG (BYOK demo)

SecureAgentRAG API

Production backend for the SecureAgentRAG public demo.

Frontend: https://secureagentrag-web.vercel.app
Source: https://github.com/moazmo/secureagentrag (branch deploy/prod-launch)
License: MIT

This Space hosts the FastAPI surface only. The Streamlit UI on main remains for local development; recruiters interact with the platform via the Next.js frontend deployed on Vercel.

Mode

Runs in BYOK (Bring Your Own Key) mode:

POST /byok/chat accepts visitor-supplied LLM credentials via headers
POST /byok/chat/stream is the SSE variant that surfaces phase / token / blocked / final events for the live trace UI
GET /byok/audit returns the visitor's last 50 PII-redacted audit entries so the frontend can display the SHA-256 chain
Owner-key fallback is throttled to 3 requests per IP per hour (Groq free tier protection) and consults X-Forwarded-For first so the throttle is not bypassed by HF's reverse proxy
Each visitor gets a session-scoped Qdrant collection that auto-purges every 24 hours
Phoenix instrumentation is hard-disabled (no third-party telemetry sees prompts or keys)
Every audit-log persist runs through utils.pii.redact with regression tests for the Groq / OpenAI / Anthropic / HF / Vercel / Qdrant JWT shapes
SAR_ALLOW_CLOUD_FOR_HIGH=true -- HIGH-sensitivity content is allowed to synthesize on the cloud LLM since this deploy has no local Ollama. The frontend renders a "sensitive: routed to cloud" badge on those answers.

Demo personas

Persona	Clearance	Roles	Sees
engineer	2 (med)	engineering	public handbook, eng runbook, incident runbook, infra ADR, ML model card, NIST RMF
compliance	3 (high)	compliance, legal	public handbook, security policy, finance Q3, vendor MSA, ML model card, NIST, HR
executive	3 (high)	executive, compliance, engineering	union of the above

The RBAC filter is enforced at the Qdrant payload layer (org_id keyword + sensitivity_level_int range + roles match-any). Chunks the persona is not authorised to see are physically not returned, regardless of cosine-similarity score.

Endpoints

Path	Purpose
`GET /healthz`	Liveness probe (used by GitHub Actions keepalive cron)
`GET /readyz`	Readiness -- pings Qdrant Cloud + Groq (Ollama skipped here)
`POST /byok/chat`	Public-demo chat (BYOK or throttled owner-key)
`POST /byok/chat/stream`	SSE variant -- emits phase / token / blocked / final events
`GET /byok/audit`	Session-scoped audit export (PII redacted, SHA-256 chained)
`POST /query`	Authenticated JWT endpoint (dev / staging compat)

Operator notes

600+ tests passing on the source repo at the commit pinned in private/roadmap.md.
Built from Dockerfile.hf in the source tree -- this Space copy is renamed to Dockerfile so HF picks it up automatically.
CPU Basic hardware (2 vCPU, 16 GB RAM). Cold cross-encoder load adds ~5 s to the first request after wake; subsequent queries answer in <1 s end-to-end against the Vercel frontend.