Spaces:
Running
Running
| # NLProxy Firewall Module Reference | |
| This document describes the firewall and prompt policy enforcement module in `firewall/firewall.py`. | |
| ## Purpose | |
| The firewall module defends NLProxy against malicious prompt patterns and injection attacks. It combines deterministic regex detection with optional semantic similarity checks. | |
| ## Key Components | |
| ### `FirewallAction` | |
| Enumerated actions: | |
| - `ALLOW` — allow the prompt unchanged. | |
| - `ALERT` — allow but log a security warning. | |
| - `REWRITE` — sanitize or rewrite the prompt. | |
| - `BLOCK` — reject the request. | |
| ### `SeverityLevel` | |
| Severity classifications: | |
| - `LOW` | |
| - `MEDIUM` | |
| - `HIGH` | |
| - `CRITICAL` | |
| ### `FirewallRule` | |
| Immutable dataclass with: | |
| - `name: str` | |
| - `pattern: str` | |
| - `action: FirewallAction` | |
| - `severity: SeverityLevel` | |
| - `description: Optional[str]` | |
| ### `compile_rule_patterns(rules)` | |
| - Compiles regex rules ahead of runtime. | |
| - Complexity: O(|R| · m), where |R| is rule count. | |
| - Returns structured objects for fast matching. | |
| ### `resolve_conflicting_actions(actions)` | |
| - Resolves multiple matching rules. | |
| - Uses priority order: `BLOCK > REWRITE > ALERT > ALLOW`. | |
| ### `PromptFirewall` | |
| #### Responsibilities | |
| - Validates incoming prompt text before compression. | |
| - Applies regex matches against a curated rule set. | |
| - Optionally uses semantic attack corpus embeddings. | |
| - Provides `check_prompt()` and `rewrite_prompt()` to callers. | |
| #### Initialization | |
| ```python | |
| PromptFirewall( | |
| regex_rules=[...], | |
| semantic_config=SEMANTIC_FIREWALL_CONFIG, | |
| default_mode="block", | |
| models_dir=Path("nlproxy") / "models", | |
| ) | |
| ``` | |
| #### Features | |
| - Defaults to blocking high-risk jailbreak patterns. | |
| - Supports semantic corpus-based detection of prompt injection. | |
| - Uses a shared singleton cache for corpus embeddings. | |
| - Maintains a clear audit trail via rule names and severities. | |
| ## Default Rules | |
| Included default rules cover: | |
| - Classic jailbreak phrases like "ignore all previous instructions" | |
| - System prompt exfiltration attempts | |
| - Privilege escalation requests | |
| - Data exfiltration requests | |
| - SQL injection-like patterns | |
| - Potential token leakage patterns | |
| ## Semantic Firewall Configuration | |
| `SEMANTIC_FIREWALL_CONFIG` defines: | |
| - `enabled`: false by default | |
| - `model_name`: `all-MiniLM-L6-v2` | |
| - `similarity_threshold`: `0.85` | |
| - `attack_corpus`: curated attacker phrase samples | |
| - `device_preference`: `cpu` | |
| ## Dependencies | |
| - `numpy` | |
| - Optional: `sentence_transformers` | |
| ## Performance and Scalability | |
| - Regex-only evaluation is fast and deterministic. | |
| - Semantic checking adds vector embed cost, so it should be reserved for high-security deployments. | |
| - The module is safe to call per-request and is optimized for prompt lengths typical of LLM inputs. | |
| ## Edge Cases | |
| - If semantic model loading fails, the module should not block standard regex detection. | |
| - When multiple rules match, the most restrictive action is enforced. | |
| - Rewrite behavior should preserve prompt structure and avoid introducing new attack surface. | |