Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /firewall.md

Luiserb

first commit

2129c29 15 days ago

preview code

Raw

History Blame Contribute Delete

3 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Firewall Module Reference

This document describes the firewall and prompt policy enforcement module in firewall/firewall.py.

Purpose

The firewall module defends NLProxy against malicious prompt patterns and injection attacks. It combines deterministic regex detection with optional semantic similarity checks.

Key Components

`FirewallAction`

Enumerated actions:

ALLOW — allow the prompt unchanged.
ALERT — allow but log a security warning.
REWRITE — sanitize or rewrite the prompt.
BLOCK — reject the request.

`SeverityLevel`

Severity classifications:

LOW
MEDIUM
HIGH
CRITICAL

`FirewallRule`

Immutable dataclass with:

name: str
pattern: str
action: FirewallAction
severity: SeverityLevel
description: Optional[str]

`compile_rule_patterns(rules)`

Compiles regex rules ahead of runtime.
Complexity: O(|R| · m), where |R| is rule count.
Returns structured objects for fast matching.

`resolve_conflicting_actions(actions)`

Resolves multiple matching rules.
Uses priority order: BLOCK > REWRITE > ALERT > ALLOW.

`PromptFirewall`

Responsibilities

Validates incoming prompt text before compression.
Applies regex matches against a curated rule set.
Optionally uses semantic attack corpus embeddings.
Provides check_prompt() and rewrite_prompt() to callers.

Initialization

PromptFirewall(
    regex_rules=[...],
    semantic_config=SEMANTIC_FIREWALL_CONFIG,
    default_mode="block",
    models_dir=Path("nlproxy") / "models",
)

Features

Defaults to blocking high-risk jailbreak patterns.
Supports semantic corpus-based detection of prompt injection.
Uses a shared singleton cache for corpus embeddings.
Maintains a clear audit trail via rule names and severities.

Default Rules

Included default rules cover:

Classic jailbreak phrases like "ignore all previous instructions"
System prompt exfiltration attempts
Privilege escalation requests
Data exfiltration requests
SQL injection-like patterns
Potential token leakage patterns

Semantic Firewall Configuration

SEMANTIC_FIREWALL_CONFIG defines:

enabled: false by default
model_name: all-MiniLM-L6-v2
similarity_threshold: 0.85
attack_corpus: curated attacker phrase samples
device_preference: cpu

Dependencies

numpy
Optional: sentence_transformers

Performance and Scalability

Regex-only evaluation is fast and deterministic.
Semantic checking adds vector embed cost, so it should be reserved for high-security deployments.
The module is safe to call per-request and is optimized for prompt lengths typical of LLM inputs.

Edge Cases

If semantic model loading fails, the module should not block standard regex detection.
When multiple rules match, the most restrictive action is enforced.
Rewrite behavior should preserve prompt structure and avoid introducing new attack surface.