NLProxy / nlproxy /docs /firewall.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
3 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Firewall Module Reference

This document describes the firewall and prompt policy enforcement module in firewall/firewall.py.

Purpose

The firewall module defends NLProxy against malicious prompt patterns and injection attacks. It combines deterministic regex detection with optional semantic similarity checks.

Key Components

FirewallAction

Enumerated actions:

  • ALLOW — allow the prompt unchanged.
  • ALERT — allow but log a security warning.
  • REWRITE — sanitize or rewrite the prompt.
  • BLOCK — reject the request.

SeverityLevel

Severity classifications:

  • LOW
  • MEDIUM
  • HIGH
  • CRITICAL

FirewallRule

Immutable dataclass with:

  • name: str
  • pattern: str
  • action: FirewallAction
  • severity: SeverityLevel
  • description: Optional[str]

compile_rule_patterns(rules)

  • Compiles regex rules ahead of runtime.
  • Complexity: O(|R| · m), where |R| is rule count.
  • Returns structured objects for fast matching.

resolve_conflicting_actions(actions)

  • Resolves multiple matching rules.
  • Uses priority order: BLOCK > REWRITE > ALERT > ALLOW.

PromptFirewall

Responsibilities

  • Validates incoming prompt text before compression.
  • Applies regex matches against a curated rule set.
  • Optionally uses semantic attack corpus embeddings.
  • Provides check_prompt() and rewrite_prompt() to callers.

Initialization

PromptFirewall(
    regex_rules=[...],
    semantic_config=SEMANTIC_FIREWALL_CONFIG,
    default_mode="block",
    models_dir=Path("nlproxy") / "models",
)

Features

  • Defaults to blocking high-risk jailbreak patterns.
  • Supports semantic corpus-based detection of prompt injection.
  • Uses a shared singleton cache for corpus embeddings.
  • Maintains a clear audit trail via rule names and severities.

Default Rules

Included default rules cover:

  • Classic jailbreak phrases like "ignore all previous instructions"
  • System prompt exfiltration attempts
  • Privilege escalation requests
  • Data exfiltration requests
  • SQL injection-like patterns
  • Potential token leakage patterns

Semantic Firewall Configuration

SEMANTIC_FIREWALL_CONFIG defines:

  • enabled: false by default
  • model_name: all-MiniLM-L6-v2
  • similarity_threshold: 0.85
  • attack_corpus: curated attacker phrase samples
  • device_preference: cpu

Dependencies

  • numpy
  • Optional: sentence_transformers

Performance and Scalability

  • Regex-only evaluation is fast and deterministic.
  • Semantic checking adds vector embed cost, so it should be reserved for high-security deployments.
  • The module is safe to call per-request and is optimized for prompt lengths typical of LLM inputs.

Edge Cases

  • If semantic model loading fails, the module should not block standard regex detection.
  • When multiple rules match, the most restrictive action is enforced.
  • Rewrite behavior should preserve prompt structure and avoid introducing new attack surface.