Add SECURITY.md (so model card can link relatively)

e19575e verified 4 days ago

4.11 kB

	# Security

	## Status: Conference Talk Demo

	This repository accompanies a conference keynote on local on-device AI. It is
	a demo, not a production system. Several design decisions traded security
	hardening for narrative clarity on stage. Read this document before deploying
	anything from this repo to a network where untrusted users can interact with
	it.

	---

	## What's in scope (defended by design)

	- Local-only execution by default. All models run on the developer's
	machine via `llama-server`. The default `network_mode` blocks the
	cloud-comparison endpoints and prevents accidental egress.
	- Read-only SQL. The `sql_query` tool whitelists `SELECT` statements and
	blocks `INSERT/UPDATE/DELETE/DROP/CREATE/ALTER/...` via regex pre-filter
	before the query reaches SQLite. See `src/engine/tools/sql_query.py`.
	- Code-execution sandbox. The `calculator` tool uses `simpleeval` with
	no builtins, no imports, and a whitelisted operator set — a malicious
	expression cannot exec arbitrary Python.
	- Prompt-injection pre-filter. A multi-layer defence in
	`src/engine/agent/intent_classifier.py`:
	- ~30 regex patterns for known injection phrasings (English + German)
	- Gibberish detector (character entropy)
	- Non-ASCII normaliser
	- LogReg-confidence threshold (0.65) — low-confidence queries route to a
	canned refusal message instead of through the agent
	- Network mode toggle. A single switch in the Observatory UI (and
	`network_mode` env var) gates the cloud-comparison endpoint. With it off,
	no model call can reach an external API.

	---

	## What's out of scope (do NOT deploy as-is in these contexts)

	\| Concern \| Status \| Why \|
	\|---\|---\|---\|
	\| Multi-tenant exposure \| ❌ \| No per-user auth, no rate limiting, no tenant isolation. The agent is built for a single demo user, not a public API. \|
	\| Adversarial robustness \| ⚠️ Demo-grade \| The pre-filter catches ~93% of canned adversarial queries; it is not pen-tested against a motivated attacker. \|
	\| SQL injection \| ⚠️ Demo-grade \| The regex defence stops `DROP TABLE`-shaped attacks but is not a substitute for parameterised queries against a production DB. \|
	\| PII / regulated data \| ❌ \| No PII redaction, no audit logging, no encryption at rest beyond what your OS provides. The demo data (`Nextera`) is fully synthetic. \|
	\| Network exposure helpers \| ❌ \| The bundled scripts to expose the local server (ngrok / Tailscale) were removed for the public release. If you re-add them, put a proper auth layer in front. \|
	\| Supply chain pinning \| ⚠️ Snapshot \| Dependencies are pinned at the talk's state. The npm package-locks are clean as of this commit but will drift; re-audit before any operational deployment. \|
	\| Fine-tuned model safety \| ⚠️ \| The fine-tunes were trained on a synthetic scenario. They have no jailbreak resistance or alignment guardrails beyond what the base Gemma/Qwen models ship with. \|
	\| Long-running operation \| ❌ \| No telemetry pipeline, no SLO monitoring, no graceful degradation tested beyond what the demo exercises. \|

	---

	## Reporting security issues

	Public, non-sensitive: open an issue on the GitHub repo.

	Sensitive (please don't disclose publicly): email
	[christian.weyer@thinktecture.com](mailto:christian.weyer@thinktecture.com).

	There is no security SLA. Severe issues will be acknowledged and may be
	documented; the repository is treated as a finished talk rather than an
	ongoing project. If you're building on top of this code, you own the
	hardening required for your deployment environment.

	---

	## Threat model summary

	The intended attacker is a curious-but-not-determined demo participant:
	someone who notices the SQL pill or the airplane-mode toggle and tries an
	adversarial query during the live demo. The defences listed above handle
	that audience comfortably.

	The intended attacker is not a motivated red team. If your deployment
	environment includes that audience, this repository is a starting reference
	for the architectural patterns, not the deployable artefact.