# NLProxy Firewall Module Reference This document describes the firewall and prompt policy enforcement module in `firewall/firewall.py`. ## Purpose The firewall module defends NLProxy against malicious prompt patterns and injection attacks. It combines deterministic regex detection with optional semantic similarity checks. ## Key Components ### `FirewallAction` Enumerated actions: - `ALLOW` — allow the prompt unchanged. - `ALERT` — allow but log a security warning. - `REWRITE` — sanitize or rewrite the prompt. - `BLOCK` — reject the request. ### `SeverityLevel` Severity classifications: - `LOW` - `MEDIUM` - `HIGH` - `CRITICAL` ### `FirewallRule` Immutable dataclass with: - `name: str` - `pattern: str` - `action: FirewallAction` - `severity: SeverityLevel` - `description: Optional[str]` ### `compile_rule_patterns(rules)` - Compiles regex rules ahead of runtime. - Complexity: O(|R| · m), where |R| is rule count. - Returns structured objects for fast matching. ### `resolve_conflicting_actions(actions)` - Resolves multiple matching rules. - Uses priority order: `BLOCK > REWRITE > ALERT > ALLOW`. ### `PromptFirewall` #### Responsibilities - Validates incoming prompt text before compression. - Applies regex matches against a curated rule set. - Optionally uses semantic attack corpus embeddings. - Provides `check_prompt()` and `rewrite_prompt()` to callers. #### Initialization ```python PromptFirewall( regex_rules=[...], semantic_config=SEMANTIC_FIREWALL_CONFIG, default_mode="block", models_dir=Path("nlproxy") / "models", ) ``` #### Features - Defaults to blocking high-risk jailbreak patterns. - Supports semantic corpus-based detection of prompt injection. - Uses a shared singleton cache for corpus embeddings. - Maintains a clear audit trail via rule names and severities. ## Default Rules Included default rules cover: - Classic jailbreak phrases like "ignore all previous instructions" - System prompt exfiltration attempts - Privilege escalation requests - Data exfiltration requests - SQL injection-like patterns - Potential token leakage patterns ## Semantic Firewall Configuration `SEMANTIC_FIREWALL_CONFIG` defines: - `enabled`: false by default - `model_name`: `all-MiniLM-L6-v2` - `similarity_threshold`: `0.85` - `attack_corpus`: curated attacker phrase samples - `device_preference`: `cpu` ## Dependencies - `numpy` - Optional: `sentence_transformers` ## Performance and Scalability - Regex-only evaluation is fast and deterministic. - Semantic checking adds vector embed cost, so it should be reserved for high-security deployments. - The module is safe to call per-request and is optimized for prompt lengths typical of LLM inputs. ## Edge Cases - If semantic model loading fails, the module should not block standard regex detection. - When multiple rules match, the most restrictive action is enforced. - Rewrite behavior should preserve prompt structure and avoid introducing new attack surface.