Spaces:
Sleeping
Security β Upload Gates + Hallucination Defense
Auto-generated. Source modules: backend/security.py + backend/faithfulness.py.
Upload security β 8 gates
Every PDF uploaded via /api/upload-policy runs through these gates before
indexing. Pipeline lives in backend/uploaded_docs.py + backend/security.py,
governed by ADR-044 (2026-05-27). Failure logs to logs/upload_blocks.jsonl.
| # | Gate | Check |
|---|---|---|
| 1 | File mechanics | Magic bytes %PDF; size 5KB-25MB; %%EOF present; dangerous PDF features (/JavaScript, /Launch, /OpenAction, /EmbeddedFile, /SubmitForm, /AA, /RichMedia, /Movie, /Sound, /GoToR); embedded executable signatures (Windows PE, Linux ELF, Mach-O, Java class, shell, HTML/JS, PHP) |
| 2 | Content quality | β₯1,500 chars text; β₯3 pages; β₯1 insurance keyword match (catches "garbage PDF" uploads) |
| 3 | Prompt injection | Regex sweep for "ignore previous instructions", "system prompt reveal", jailbreak markers, role-takeover patterns, im_start/im_end tokens |
| 4 | Per-session rate limit | 5 uploads/hour/session; 200 chunks/session lifetime |
| 5 | Per-IP rate limit | 10 uploads/hour/IP (per X-Forwarded-For or peer IP) |
| 6 | Encrypted / locked PDF reject | Refuse any PDF that is password-protected or has restrictive permissions blocking text extraction |
| 7 | Page-count ceiling | Reject PDFs with >200 pages |
| 8 | Hash dedupe + reject-cache | Re-uploads of an already-accepted PDF are deduped; re-uploads of a previously-rejected hash are short-circuited |
Beyond the 8, a UIN net-new check + PDF-text fuzzy match against the catalogued 148 also run β uploads that match an existing catalogued policy short-circuit to the catalogued card.
All gates run for EVERY upload. Block on any failure; the audit trail captures the reason set. See README Β§2.8 and 70-docs/60-decisions/ADR-044-uploaded-pdf-parity.md for the dual-write model and the heuristic-floor / Gemini extraction chain.
Hallucination defense β structural grounding (post-KI-225, 2026-05-15)
The single brain (backend/single_brain.py) quotes only what its tools returned:
| Tool | What it returns | Where the brain reads it |
|---|---|---|
retrieve_policies |
top-k policy-wording chunks from Chroma | backend/brain_tools.py::retrieve_policies |
get_policy_facts |
curated structured facts + verbatim source_quote |
backend/brain_tools.py::get_policy_facts |
The brain's system prompt enforces "cite only what the tools returned" as a structural invariant. The pre-KI-225 architecture had a separate backend/faithfulness.py 4-gate post-hoc verifier β that module was removed in the single-brain consolidation because the single LLM's tool-grounded output flow makes the post-hoc verification structurally unnecessary. Source: ADR-040 + KI-225.
What WE can't (yet) check
- LLM determinism (DeepSeek-V3 / Sarvam-M can produce slightly different
output at
temperature=0). - Insurer-side PDF tampering β we trust the source PDF was real at download.
- Embedding model drift β pinned to BGE-small-en-v1.5.
These are explicit limits documented in kb/AUDIT_TRAIL.md Β§5.