christianweyer commited on
Commit
e19575e
·
verified ·
1 Parent(s): 7f3c126

Add SECURITY.md (so model card can link relatively)

Browse files
Files changed (1) hide show
  1. SECURITY.md +75 -0
SECURITY.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Security
2
+
3
+ ## Status: Conference Talk Demo
4
+
5
+ This repository accompanies a conference keynote on local on-device AI. It is
6
+ a **demo**, not a production system. Several design decisions traded security
7
+ hardening for narrative clarity on stage. Read this document before deploying
8
+ anything from this repo to a network where untrusted users can interact with
9
+ it.
10
+
11
+ ---
12
+
13
+ ## What's in scope (defended by design)
14
+
15
+ - **Local-only execution by default.** All models run on the developer's
16
+ machine via `llama-server`. The default `network_mode` blocks the
17
+ cloud-comparison endpoints and prevents accidental egress.
18
+ - **Read-only SQL.** The `sql_query` tool whitelists `SELECT` statements and
19
+ blocks `INSERT/UPDATE/DELETE/DROP/CREATE/ALTER/...` via regex pre-filter
20
+ before the query reaches SQLite. See `src/engine/tools/sql_query.py`.
21
+ - **Code-execution sandbox.** The `calculator` tool uses `simpleeval` with
22
+ no builtins, no imports, and a whitelisted operator set — a malicious
23
+ expression cannot exec arbitrary Python.
24
+ - **Prompt-injection pre-filter.** A multi-layer defence in
25
+ `src/engine/agent/intent_classifier.py`:
26
+ - ~30 regex patterns for known injection phrasings (English + German)
27
+ - Gibberish detector (character entropy)
28
+ - Non-ASCII normaliser
29
+ - LogReg-confidence threshold (0.65) — low-confidence queries route to a
30
+ canned refusal message instead of through the agent
31
+ - **Network mode toggle.** A single switch in the Observatory UI (and
32
+ `network_mode` env var) gates the cloud-comparison endpoint. With it off,
33
+ no model call can reach an external API.
34
+
35
+ ---
36
+
37
+ ## What's out of scope (do NOT deploy as-is in these contexts)
38
+
39
+ | Concern | Status | Why |
40
+ |---|---|---|
41
+ | **Multi-tenant exposure** | ❌ | No per-user auth, no rate limiting, no tenant isolation. The agent is built for a single demo user, not a public API. |
42
+ | **Adversarial robustness** | ⚠️ Demo-grade | The pre-filter catches ~93% of canned adversarial queries; it is not pen-tested against a motivated attacker. |
43
+ | **SQL injection** | ⚠️ Demo-grade | The regex defence stops `DROP TABLE`-shaped attacks but is not a substitute for parameterised queries against a production DB. |
44
+ | **PII / regulated data** | ❌ | No PII redaction, no audit logging, no encryption at rest beyond what your OS provides. The demo data (`Nextera`) is fully synthetic. |
45
+ | **Network exposure helpers** | ❌ | The bundled scripts to expose the local server (ngrok / Tailscale) were removed for the public release. If you re-add them, put a proper auth layer in front. |
46
+ | **Supply chain pinning** | ⚠️ Snapshot | Dependencies are pinned at the talk's state. The npm package-locks are clean *as of this commit* but will drift; re-audit before any operational deployment. |
47
+ | **Fine-tuned model safety** | ⚠️ | The fine-tunes were trained on a synthetic scenario. They have no jailbreak resistance or alignment guardrails beyond what the base Gemma/Qwen models ship with. |
48
+ | **Long-running operation** | ❌ | No telemetry pipeline, no SLO monitoring, no graceful degradation tested beyond what the demo exercises. |
49
+
50
+ ---
51
+
52
+ ## Reporting security issues
53
+
54
+ **Public, non-sensitive:** open an issue on the GitHub repo.
55
+
56
+ **Sensitive (please don't disclose publicly):** email
57
+ [christian.weyer@thinktecture.com](mailto:christian.weyer@thinktecture.com).
58
+
59
+ There is **no security SLA**. Severe issues will be acknowledged and may be
60
+ documented; the repository is treated as a finished talk rather than an
61
+ ongoing project. If you're building on top of this code, you own the
62
+ hardening required for your deployment environment.
63
+
64
+ ---
65
+
66
+ ## Threat model summary
67
+
68
+ The intended attacker is a **curious-but-not-determined demo participant**:
69
+ someone who notices the SQL pill or the airplane-mode toggle and tries an
70
+ adversarial query during the live demo. The defences listed above handle
71
+ that audience comfortably.
72
+
73
+ The intended attacker is **not** a motivated red team. If your deployment
74
+ environment includes that audience, this repository is a starting reference
75
+ for the architectural patterns, not the deployable artefact.