Spaces:

rohitsar567
/

InsuranceBot

Sleeping

App Files Files Community

InsuranceBot / kb /security /INDEX.md

rohitsar567

docs: Cluster A (count drift) + Cluster B (deleted-module refs) sweep

4c728a9 about 1 month ago

preview code

Raw

History Blame Contribute Delete

3.19 kB

Security — Upload Gates + Hallucination Defense

Auto-generated. Source modules: backend/security.py + backend/faithfulness.py.

Upload security — 8 gates

Every PDF uploaded via /api/upload-policy runs through these gates before indexing. Pipeline lives in backend/uploaded_docs.py + backend/security.py, governed by ADR-044 (2026-05-27). Failure logs to logs/upload_blocks.jsonl.

#	Gate	Check
1	File mechanics	Magic bytes `%PDF`; size 5KB-25MB; `%%EOF` present; dangerous PDF features (`/JavaScript`, `/Launch`, `/OpenAction`, `/EmbeddedFile`, `/SubmitForm`, `/AA`, `/RichMedia`, `/Movie`, `/Sound`, `/GoToR`); embedded executable signatures (Windows PE, Linux ELF, Mach-O, Java class, shell, HTML/JS, PHP)
2	Content quality	≥1,500 chars text; ≥3 pages; ≥1 insurance keyword match (catches "garbage PDF" uploads)
3	Prompt injection	Regex sweep for "ignore previous instructions", "system prompt reveal", jailbreak markers, role-takeover patterns, im_start/im_end tokens
4	Per-session rate limit	5 uploads/hour/session; 200 chunks/session lifetime
5	Per-IP rate limit	10 uploads/hour/IP (per X-Forwarded-For or peer IP)
6	Encrypted / locked PDF reject	Refuse any PDF that is password-protected or has restrictive permissions blocking text extraction
7	Page-count ceiling	Reject PDFs with >200 pages
8	Hash dedupe + reject-cache	Re-uploads of an already-accepted PDF are deduped; re-uploads of a previously-rejected hash are short-circuited

Beyond the 8, a UIN net-new check + PDF-text fuzzy match against the catalogued 148 also run — uploads that match an existing catalogued policy short-circuit to the catalogued card.

All gates run for EVERY upload. Block on any failure; the audit trail captures the reason set. See README §2.8 and 70-docs/60-decisions/ADR-044-uploaded-pdf-parity.md for the dual-write model and the heuristic-floor / Gemini extraction chain.

Hallucination defense — structural grounding (post-KI-225, 2026-05-15)

The single brain (backend/single_brain.py) quotes only what its tools returned:

Tool	What it returns	Where the brain reads it
`retrieve_policies`	top-k policy-wording chunks from Chroma	`backend/brain_tools.py::retrieve_policies`
`get_policy_facts`	curated structured facts + verbatim `source_quote`	`backend/brain_tools.py::get_policy_facts`

The brain's system prompt enforces "cite only what the tools returned" as a structural invariant. The pre-KI-225 architecture had a separate backend/faithfulness.py 4-gate post-hoc verifier — that module was removed in the single-brain consolidation because the single LLM's tool-grounded output flow makes the post-hoc verification structurally unnecessary. Source: ADR-040 + KI-225.

What WE can't (yet) check

LLM determinism (DeepSeek-V3 / Sarvam-M can produce slightly different output at temperature=0).
Insurer-side PDF tampering — we trust the source PDF was real at download.
Embedding model drift — pinned to BGE-small-en-v1.5.

These are explicit limits documented in kb/AUDIT_TRAIL.md §5.