Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /firewall.md

Luiserb

first commit

2129c29 15 days ago

preview code

Raw

History Blame Contribute Delete

3 kB

	# NLProxy Firewall Module Reference

	This document describes the firewall and prompt policy enforcement module in `firewall/firewall.py`.

	## Purpose

	The firewall module defends NLProxy against malicious prompt patterns and injection attacks. It combines deterministic regex detection with optional semantic similarity checks.

	## Key Components

	### `FirewallAction`

	Enumerated actions:

	- `ALLOW` — allow the prompt unchanged.
	- `ALERT` — allow but log a security warning.
	- `REWRITE` — sanitize or rewrite the prompt.
	- `BLOCK` — reject the request.

	### `SeverityLevel`

	Severity classifications:

	- `LOW`
	- `MEDIUM`
	- `HIGH`
	- `CRITICAL`

	### `FirewallRule`

	Immutable dataclass with:

	- `name: str`
	- `pattern: str`
	- `action: FirewallAction`
	- `severity: SeverityLevel`
	- `description: Optional[str]`

	### `compile_rule_patterns(rules)`

	- Compiles regex rules ahead of runtime.
	- Complexity: O(\|R\| · m), where \|R\| is rule count.
	- Returns structured objects for fast matching.

	### `resolve_conflicting_actions(actions)`

	- Resolves multiple matching rules.
	- Uses priority order: `BLOCK > REWRITE > ALERT > ALLOW`.

	### `PromptFirewall`

	#### Responsibilities

	- Validates incoming prompt text before compression.
	- Applies regex matches against a curated rule set.
	- Optionally uses semantic attack corpus embeddings.
	- Provides `check_prompt()` and `rewrite_prompt()` to callers.

	#### Initialization

	```python
	PromptFirewall(
	regex_rules=[...],
	semantic_config=SEMANTIC_FIREWALL_CONFIG,
	default_mode="block",
	models_dir=Path("nlproxy") / "models",
	)
	```

	#### Features

	- Defaults to blocking high-risk jailbreak patterns.
	- Supports semantic corpus-based detection of prompt injection.
	- Uses a shared singleton cache for corpus embeddings.
	- Maintains a clear audit trail via rule names and severities.

	## Default Rules

	Included default rules cover:

	- Classic jailbreak phrases like "ignore all previous instructions"
	- System prompt exfiltration attempts
	- Privilege escalation requests
	- Data exfiltration requests
	- SQL injection-like patterns
	- Potential token leakage patterns

	## Semantic Firewall Configuration

	`SEMANTIC_FIREWALL_CONFIG` defines:

	- `enabled`: false by default
	- `model_name`: `all-MiniLM-L6-v2`
	- `similarity_threshold`: `0.85`
	- `attack_corpus`: curated attacker phrase samples
	- `device_preference`: `cpu`

	## Dependencies

	- `numpy`
	- Optional: `sentence_transformers`

	## Performance and Scalability

	- Regex-only evaluation is fast and deterministic.
	- Semantic checking adds vector embed cost, so it should be reserved for high-security deployments.
	- The module is safe to call per-request and is optimized for prompt lengths typical of LLM inputs.

	## Edge Cases

	- If semantic model loading fails, the module should not block standard regex detection.
	- When multiple rules match, the most restrictive action is enforced.
	- Rewrite behavior should preserve prompt structure and avoid introducing new attack surface.