Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.19.0
NLProxy Firewall Module Reference
This document describes the firewall and prompt policy enforcement module in firewall/firewall.py.
Purpose
The firewall module defends NLProxy against malicious prompt patterns and injection attacks. It combines deterministic regex detection with optional semantic similarity checks.
Key Components
FirewallAction
Enumerated actions:
ALLOW— allow the prompt unchanged.ALERT— allow but log a security warning.REWRITE— sanitize or rewrite the prompt.BLOCK— reject the request.
SeverityLevel
Severity classifications:
LOWMEDIUMHIGHCRITICAL
FirewallRule
Immutable dataclass with:
name: strpattern: straction: FirewallActionseverity: SeverityLeveldescription: Optional[str]
compile_rule_patterns(rules)
- Compiles regex rules ahead of runtime.
- Complexity: O(|R| · m), where |R| is rule count.
- Returns structured objects for fast matching.
resolve_conflicting_actions(actions)
- Resolves multiple matching rules.
- Uses priority order:
BLOCK > REWRITE > ALERT > ALLOW.
PromptFirewall
Responsibilities
- Validates incoming prompt text before compression.
- Applies regex matches against a curated rule set.
- Optionally uses semantic attack corpus embeddings.
- Provides
check_prompt()andrewrite_prompt()to callers.
Initialization
PromptFirewall(
regex_rules=[...],
semantic_config=SEMANTIC_FIREWALL_CONFIG,
default_mode="block",
models_dir=Path("nlproxy") / "models",
)
Features
- Defaults to blocking high-risk jailbreak patterns.
- Supports semantic corpus-based detection of prompt injection.
- Uses a shared singleton cache for corpus embeddings.
- Maintains a clear audit trail via rule names and severities.
Default Rules
Included default rules cover:
- Classic jailbreak phrases like "ignore all previous instructions"
- System prompt exfiltration attempts
- Privilege escalation requests
- Data exfiltration requests
- SQL injection-like patterns
- Potential token leakage patterns
Semantic Firewall Configuration
SEMANTIC_FIREWALL_CONFIG defines:
enabled: false by defaultmodel_name:all-MiniLM-L6-v2similarity_threshold:0.85attack_corpus: curated attacker phrase samplesdevice_preference:cpu
Dependencies
numpy- Optional:
sentence_transformers
Performance and Scalability
- Regex-only evaluation is fast and deterministic.
- Semantic checking adds vector embed cost, so it should be reserved for high-security deployments.
- The module is safe to call per-request and is optimized for prompt lengths typical of LLM inputs.
Edge Cases
- If semantic model loading fails, the module should not block standard regex detection.
- When multiple rules match, the most restrictive action is enforced.
- Rewrite behavior should preserve prompt structure and avoid introducing new attack surface.