Nomearod Claude Opus 4.6 (1M context) commited on
Commit
f7bb777
Β·
1 Parent(s): 06bc29e

docs: add security architecture section to README and DECISIONS.md

Browse files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (2) hide show
  1. DECISIONS.md +32 -0
  2. README.md +76 -14
DECISIONS.md CHANGED
@@ -281,3 +281,35 @@ request on first `complete()` call with tools and checks if the response contain
281
  `tool_calls`. The result is cached as `self._supports_tool_calling`. Transient failures
282
  (timeout, 5xx) return `None` and retry on the next call rather than permanently
283
  downgrading to prompt-based fallback.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
281
  `tool_calls`. The result is cached as `self._supports_tool_calling`. Transient failures
282
  (timeout, 5xx) return `None` and retry on the next call rather than permanently
283
  downgrading to prompt-based fallback.
284
+
285
+ ## Why two-tier injection detection, not three
286
+
287
+ The original design included a middle tier (embedding similarity against known injection examples). Dropped because the existing embedding model (all-MiniLM-L6-v2) is a general-purpose sentence encoder, not specialized for adversarial detection. Cosine similarity can't distinguish semantic similarity from intent similarity β€” "how do I ignore a field in Pydantic?" clusters near "ignore previous instructions" in that embedding space. The threshold between "ambiguous" and "suspicious" is an untunable hyperparameter with no ground truth.
288
+
289
+ Two tiers are cleaner: heuristic regex is deterministic (matches or doesn't), DeBERTa classifier is probabilistic (confidence score). No ambiguous handoff between two probabilistic layers. Deployments without GPU get heuristic-only β€” documented, not hidden.
290
+
291
+ ## Why regex + optional spaCy for PII, not a cloud API
292
+
293
+ Three reasons: cost (cloud PII APIs charge per call), latency (adds network round-trip to every retrieved chunk), and data residency (PII leaves the system boundary). Regex covers the PII types with actual legal/compliance risk: SSNs, credit cards, emails, phone numbers, IP addresses.
294
+
295
+ spaCy NER (PERSON, ORG) is optional because false-positive rates on technical text are unacceptable without domain tuning. "FastAPI" triggers ORG, "Jordan" triggers PERSON. The optional import pattern (`try: import spacy`) degrades gracefully with a logged warning β€” no crash if someone sets `use_ner: true` without installing spaCy.
296
+
297
+ ## Why append-only JSONL for audit, not SQLite
298
+
299
+ One codepath, one format, no config branching. JSONL is append-only by nature β€” no schema migrations, no transactions, no connection pooling. Log rotation handles size. `jq` provides immediate queryability without building a custom API.
300
+
301
+ The original design included an optional SQLite backend and a query endpoint (`GET /admin/audit`). Both were dropped: SQLite adds a second storage codepath with no consumer, and the query endpoint would require API key authentication β€” an inconsistency when `/ask` itself has no auth.
302
+
303
+ JSONL imports trivially into SQLite/DuckDB if structured queries are needed later. No bridges burned.
304
+
305
+ ## Why HMAC-SHA256 IP hashing in audit logs
306
+
307
+ HMAC-SHA256 with a server secret hashes client IPs before logging. Plain SHA-256 was considered but rejected: the IPv4 address space (~4.3 billion) is small enough that unsalted hashes are reversible by offline enumeration. HMAC-SHA256 with a secret key makes precomputation infeasible without the key. The key is sourced from an explicit parameter, `AUDIT_HMAC_KEY` env var, or (with a logged warning) a random per-process fallback.
308
+
309
+ ## Why three output validators, not four
310
+
311
+ The original design included a "length/format sanity check" (reject suspiciously short responses or raw JSON in natural-language context). Dropped because the calculator tool returns short numeric answers and the tech docs domain legitimately contains code blocks and JSON examples. Every false positive erodes trust in the validation layer. The three remaining checks β€” PII leakage, URL hallucination, blocklist β€” are deterministic with clear pass/fail semantics.
312
+
313
+ ## Why buffer-then-validate for streaming output
314
+
315
+ The `/ask/stream` endpoint buffers all events from the orchestrator before sending to the client, then validates the assembled answer. This means the client waits for the full answer before receiving any content chunks. The orchestrator emits the final synthesis as a single chunk (tool-use iterations are not streamed), so the buffering adds no perceptible latency. The alternative β€” streaming chunks immediately and appending a safety marker β€” leaks unsafe content to any client that stops reading after the `done` event.
README.md CHANGED
@@ -134,12 +134,74 @@ flowchart LR
134
  end
135
  ```
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  ## Engineering Scope
138
 
139
  - **Agent design & evaluation**: Built two independent orchestration approaches (custom tool-calling loop + LangChain AgentExecutor) and evaluated both on identical metrics to quantify framework tradeoffs
140
  - **Retrieval engineering**: Hybrid FAISS + BM25 with Reciprocal Rank Fusion, cross-encoder reranking, evaluated across 27 questions with P@5, R@5, citation accuracy
141
  - **Infrastructure:** Kubernetes (Helm), Terraform (GCP/GKE), self-hosted LLM serving (vLLM on Modal + Docker Compose)
142
  - **MLOps:** Provider comparison benchmark (API vs self-hosted, real measured data)
 
143
  - **Production engineering**: FastAPI, Docker, CI/CD, structured logging, rate limiting, SSE streaming, conversation sessions, 205 deterministic tests with mock providers
144
 
145
  <details><summary>API Reference</summary>
@@ -211,17 +273,17 @@ All tests use MockProvider + MockEmbeddingModel. No API keys. No model downloads
211
 
212
  See [DECISIONS.md](DECISIONS.md) for rationale on building from primitives, RRF over score normalization, negative evaluation cases, deterministic eval + optional LLM judge, and more.
213
 
214
- ### V1 β†’ V2 Evolution
215
-
216
- | Feature | V1 | V2 |
217
- |---------|----|----|
218
- | Grounded refusal | 0/5 | Threshold gate |
219
- | Retrieval P@5 | 0.70 | 0.74 (cross-encoder reranking) |
220
- | Provider support | OpenAI only | OpenAI + Anthropic + self-hosted vLLM |
221
- | Provider resilience | None | Retry + backoff |
222
- | Rate limiting | None | 10 RPM per IP |
223
- | Streaming | None | SSE (`/ask/stream`) |
224
- | Conversation memory | Stateless | SQLite sessions |
225
- | Infrastructure | Local only | Docker, K8s (Helm), Terraform (GKE), Modal |
226
- | CI/CD | None | GitHub Actions |
227
- | Tests | 97 | 205 |
 
134
  end
135
  ```
136
 
137
+ ## Security Architecture
138
+
139
+ Defense-in-depth pipeline with four guardrails. Each stage is independently configurable and degrades gracefully.
140
+
141
+ ```
142
+ User Input
143
+ β”‚
144
+ β–Ό
145
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
146
+ β”‚ Injection Detection β”‚ Tier 1: heuristic regex (local, <1ms)
147
+ β”‚ (pre-retrieval) β”‚ Tier 2: DeBERTa classifier (Modal GPU)
148
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
149
+ β”‚ safe
150
+ β–Ό
151
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
152
+ β”‚ Retrieval β”‚ FAISS + BM25 + RRF + cross-encoder
153
+ β”‚ (existing pipeline) β”‚
154
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
155
+ β”‚
156
+ β–Ό
157
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
158
+ β”‚ PII Redaction β”‚ regex (always) + spaCy NER (optional)
159
+ β”‚ (post-retrieval) β”‚
160
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
161
+ β”‚
162
+ β–Ό
163
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
164
+ β”‚ LLM Generation β”‚ OpenAI / Anthropic / vLLM (Modal)
165
+ β”‚ (existing pipeline) β”‚
166
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
167
+ β”‚
168
+ β–Ό
169
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
170
+ β”‚ Output Validation β”‚ PII leakage + URL check + blocklist
171
+ β”‚ (post-generation) β”‚
172
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
173
+ β”‚
174
+ β–Ό
175
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
176
+ β”‚ Audit Log β”‚ JSONL, IP-hashed, rotated
177
+ β”‚ (every request) β”‚
178
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
179
+ β”‚
180
+ β–Ό
181
+ Response
182
+ ```
183
+
184
+ **Injection detection** uses a two-tier architecture: heuristic regex rules catch common patterns (<1ms), and an optional DeBERTa classifier on Modal GPU provides high-confidence classification. Without GPU, the system runs heuristic-only β€” honest degradation, not silent failure.
185
+
186
+ **PII redaction** runs regex patterns for high-risk types (SSN, credit card, email, phone, IP address) on every retrieved chunk before it enters the LLM context window. Optional spaCy NER adds PERSON/ORG detection for deployments that need it.
187
+
188
+ **Output validation** catches PII leakage (LLM reconstructing redacted data), URL hallucination (URLs not in retrieved chunks), and blocklisted patterns (system prompt fragments, API keys).
189
+
190
+ **Audit logging** writes one structured JSON record per request to an append-only JSONL file with HMAC-SHA256 hashed IPs, injection verdicts, PII redaction counts, and output validation results.
191
+
192
+ ```bash
193
+ # Query the audit log with jq
194
+ jq 'select(.injection_verdict.safe == false)' logs/audit.jsonl
195
+ jq 'select(.session_id == "abc123")' logs/audit.jsonl
196
+ ```
197
+
198
  ## Engineering Scope
199
 
200
  - **Agent design & evaluation**: Built two independent orchestration approaches (custom tool-calling loop + LangChain AgentExecutor) and evaluated both on identical metrics to quantify framework tradeoffs
201
  - **Retrieval engineering**: Hybrid FAISS + BM25 with Reciprocal Rank Fusion, cross-encoder reranking, evaluated across 27 questions with P@5, R@5, citation accuracy
202
  - **Infrastructure:** Kubernetes (Helm), Terraform (GCP/GKE), self-hosted LLM serving (vLLM on Modal + Docker Compose)
203
  - **MLOps:** Provider comparison benchmark (API vs self-hosted, real measured data)
204
+ - **Security engineering**: Prompt injection detection (heuristic + ML classifier), PII redaction, output validation, structured audit logging with GDPR-compliant IP hashing
205
  - **Production engineering**: FastAPI, Docker, CI/CD, structured logging, rate limiting, SSE streaming, conversation sessions, 205 deterministic tests with mock providers
206
 
207
  <details><summary>API Reference</summary>
 
273
 
274
  See [DECISIONS.md](DECISIONS.md) for rationale on building from primitives, RRF over score normalization, negative evaluation cases, deterministic eval + optional LLM judge, and more.
275
 
276
+ ### V1 β†’ V2 β†’ V3 Evolution
277
+
278
+ | Feature | V1 | V2 | V3 |
279
+ |---------|----|----|-----|
280
+ | Grounded refusal | 0/5 | Threshold gate | Threshold gate |
281
+ | Retrieval P@5 | 0.70 | 0.74 (cross-encoder) | 0.74 |
282
+ | Provider support | OpenAI only | OpenAI + Anthropic + vLLM | Same |
283
+ | Streaming | None | SSE (`/ask/stream`) | SSE |
284
+ | Infrastructure | Local only | Docker, K8s, Terraform, Modal | Same |
285
+ | **Injection detection** | None | None | Two-tier (heuristic + DeBERTa) |
286
+ | **PII redaction** | None | None | Regex + optional NER |
287
+ | **Output validation** | None | None | PII leakage + URL + blocklist |
288
+ | **Audit logging** | None | None | JSONL, HMAC-hashed IPs |
289
+ | Tests | 97 | 205 | 288+ |