NLProxy / README.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
14.1 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: NLProxy Enterprise Demo
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: other
tags:
  - llm
  - prompt-compression
  - security
  - firewall
  - nli
  - pii-masking
  - enterprise

NLProxy

Prompt Security & Compression Gateway for LLMs

The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.

License PyPI GitHub


πŸŽ›οΈ About This Interactive Demo

This Hugging Face Space serves as a live, interactive sandbox for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time.

Upon startup, this Space dynamically clones the official intellideep/nlproxy repository, downloads the required ONNX/NLI models, and exposes the complete 5-Step Lifecycle via a Gradio interface.


πŸ“‰ The Problem with LLMs Today

Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:

  1. Burning money on redundant words, pleasantries, and verbose context.
  2. Leaking PII (emails, IPs, internal code) to third-party servers.
  3. Exposing yourself to jailbreaks, prompt injections, and semantic drift.

NLProxy fixes all three before the prompt ever leaves your infrastructure.


🎯 Why NLProxy?

πŸ’° Slash Your LLM Bill (Semantic Compression)

NLProxy doesn't just strip stopwords. It uses KMeans/Ward semantic clustering and ONNX-quantized embeddings to understand the meaning of your prompt. It identifies redundant sentences and compresses them, reducing token usage by 40% to 60% without losing critical intent.

Result: A $1,000/month OpenAI bill becomes $400.

πŸ—οΈ The 6-Stage Defense Pipeline (Visualized in this Demo)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    NLProxy Pipeline                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚  πŸ“₯ INPUT: "Ignore instructions... IP 192.168.1.1..."       β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ›‘οΈ [1] FIREWALL                                            β”‚
β”‚       β”œβ”€ PromptFirewall.check_prompt()                      β”‚
β”‚       └─ Action: BLOCK / ALERT / REWRITE / ALLOW            β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ“‰ [2] COMPRESS                                            β”‚
β”‚       β”œβ”€ CompressionService.compress_batch()                β”‚
β”‚       β”œβ”€ Shield β†’ Segment β†’ Cluster β†’ Reconstruct           β”‚
β”‚       └─ Output: "IP: __PROT_xxx. Do NOT use Python..."     β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ”’ [3] SAFETY                                              β”‚
β”‚       β”œβ”€ SafetyChecker.validate()                           β”‚
β”‚       └─ Reinserts critical intents if missing              β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ€– [4] LLM CALL (Simulated in this demo)                   β”‚
β”‚       β”œβ”€ LLMOrchestrator.generate()                         β”‚
β”‚       └─ OpenAI / Claude / Gemini / Local                   β”‚
β”‚       ↓                                                      β”‚
β”‚  🧹 [5] CORRECT                                             β”‚
β”‚       β”œβ”€ ResponseCorrector.correct()                        β”‚
β”‚       └─ Applies FORBID/MANDATE + redacts unauthorized      β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ” [6] VERIFY                                              β”‚
β”‚       β”œβ”€ PostLLMVerifier.verify()                           β”‚
β”‚       β”œβ”€ NLI contradiction detection                        β”‚
β”‚       └─ Confidence: 0.30 β†’ 0.85 (after auto-correction)    β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ“€ OUTPUT: "Solution in Java. Connection protected."       β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ›‘οΈ Unbreakable Security (Firewall & Verification)

  • Pre-Flight: A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
  • Post-Flight: NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.

Real‑World Use Cases

Use Case NLProxy Benefit
Chat‑based customer support Reduces token costs by 50% while preserving mandatory disclaimers and safety rules.
Code generation assistant Masks API keys and internal IPs; enforces β€œdo not use Python” restrictions.
Legal document analysis Preserves confidentiality and privilege statements even after heavy compression.
Multi‑tenant SaaS Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%.
On‑premise deployment Works fully offline, no external dependencies (optional Redis for cache).

Components

Component Function
Firewall Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration).
Shield Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE).
Segmenter Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM).
Compressor Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering.
Reconstructor Re‑injects masked entities, removes stopwords, and computes token/cost savings.
SafetyChecker Verifies critical intents/restrictions survive compression; re‑inserts missing sentences.
LLMOrchestrator Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting.
PostLLMVerifier NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring.
ResponseCorrector Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders.
Semantic Cache RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering.

Benchmark

Comparison with State‑of‑the‑Art (SOTA)

Solution Injection Prevention Entity Masking Prompt Compression Restriction Enforcement Post‑LLM Verification Offline Open Source Multi‑LLM
NLProxy βœ… βœ… βœ… (semantic) βœ… βœ… βœ… βœ… (BSL 1.1) βœ…
LangChain ❌ (no built‑in) ❌ ❌ (only templates) ❌ ❌ ⚠️ partial βœ… βœ…
Semantic Kernel ❌ ❌ ❌ ❌ ❌ ⚠️ partial βœ… βœ…
LLMLingua / Selective Context ❌ ❌ βœ… (token‑level) ❌ ❌ βœ… βœ… ❌
Rebuff (injection) βœ… ❌ ❌ ❌ ❌ ⚠️ βœ… ❌
Lakera Guard βœ… βœ… (basic) ❌ ❌ ❌ ❌ ❌ ❌
Azure OpenAI Content Safety βœ… ❌ ❌ ❌ ❌ ❌ ❌ βœ…

Key differentiators:

  • NLProxy is the only open‑source solution that combines prompt security, semantic compression, constraint enforcement, and response verification in a single pipeline.
  • All critical components work offline (embedding & NLI models are downloaded once and run locally).
  • The business‑friendly BSL 1.1 license allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue).

Compression Efficiency

Metric Value
Average token reduction (general) 45‑55%
Reduction on legal/finance documents 35‑45% (conservative)
Reduction on code prompts 55‑65%
Compression latency (per prompt) 50‑120 ms (CPU), 20‑40 ms (GPU)
Embedding model all‑MiniLM‑L6‑v2 (384 dim, ONNX)
Clustering method Auto‑select Ward (<200 sent) / K‑Means

Security & Verification

Check Accuracy / Throughput
Injection detection (regex) >99% on known patterns (MITRE ATLAS)
Semantic injection (embedding) 92% recall @ 0.85 threshold (optional)
Entity masking 100% of IPs, emails, dates, hashes
NLI contradiction detection 78‑85% accuracy (distilroberta‑base)
Restriction enforcement (FORBID) 100% (exact match)
Post‑LLM verification latency +30‑60 ms per request (NLI enabled)

End‑to‑End Latency

Configuration P95 Latency (ms)
Compression only (no NLI, no cache) 120‑180
Compression + Firewall + Shield 150‑220
Full pipeline + NLI verification 200‑300
Full pipeline + Semantic Cache (hit) <10

Scalability

Component Limit / Sizing Guideline
Max prompt length 100k chars (configurable)
Concurrent requests Limited by --workers + thread pool (default 8)
Embedding batch size 128 sentences (can be increased with more memory)
Redis cache capacity Unlimited (depends on Redis memory)
Multi‑LLM failover Supports fallback chains (OpenAI β†’ Claude β†’ Gemini)

πŸ“„ License

NLProxy is released under the Business Source License 1.1 (BSL 1.1).

  • βœ… Free for indie developers, students, non‑profits, and small businesses (revenue < $1M).
  • 🏒 Large enterprises (revenue β‰₯ $1M) require a commercial license – contact us for pricing.
  • πŸ”“ After five years from the release date, the code automatically converts to Apache 2.0.

See the LICENSE.md file for full text.


πŸ’¬ Support & Contact

We welcome contributions, but please open an issue first to discuss.