Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / README.md

Luiserb

first commit

2129c29 17 days ago

preview code

Raw

History Blame Contribute Delete

14.1 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: NLProxy Enterprise Demo
emoji: 🛡️
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: other
tags:
  - llm
  - prompt-compression
  - security
  - firewall
  - nli
  - pii-masking
  - enterprise

NLProxy

Prompt Security & Compression Gateway for LLMs

The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.

🎛️ About This Interactive Demo

This Hugging Face Space serves as a live, interactive sandbox for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time.

Upon startup, this Space dynamically clones the official intellideep/nlproxy repository, downloads the required ONNX/NLI models, and exposes the complete 5-Step Lifecycle via a Gradio interface.

📉 The Problem with LLMs Today

Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:

Burning money on redundant words, pleasantries, and verbose context.
Leaking PII (emails, IPs, internal code) to third-party servers.
Exposing yourself to jailbreaks, prompt injections, and semantic drift.

NLProxy fixes all three before the prompt ever leaves your infrastructure.

🎯 Why NLProxy?

💰 Slash Your LLM Bill (Semantic Compression)

NLProxy doesn't just strip stopwords. It uses KMeans/Ward semantic clustering and ONNX-quantized embeddings to understand the meaning of your prompt. It identifies redundant sentences and compresses them, reducing token usage by 40% to 60% without losing critical intent.

Result: A $1,000/month OpenAI bill becomes $400.

🏗️ The 6-Stage Defense Pipeline (Visualized in this Demo)

┌─────────────────────────────────────────────────────────────┐
│                    NLProxy Pipeline                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  📥 INPUT: "Ignore instructions... IP 192.168.1.1..."       │
│       ↓                                                      │
│  🛡️ [1] FIREWALL                                            │
│       ├─ PromptFirewall.check_prompt()                      │
│       └─ Action: BLOCK / ALERT / REWRITE / ALLOW            │
│       ↓                                                      │
│  📉 [2] COMPRESS                                            │
│       ├─ CompressionService.compress_batch()                │
│       ├─ Shield → Segment → Cluster → Reconstruct           │
│       └─ Output: "IP: __PROT_xxx. Do NOT use Python..."     │
│       ↓                                                      │
│  🔒 [3] SAFETY                                              │
│       ├─ SafetyChecker.validate()                           │
│       └─ Reinserts critical intents if missing              │
│       ↓                                                      │
│  🤖 [4] LLM CALL (Simulated in this demo)                   │
│       ├─ LLMOrchestrator.generate()                         │
│       └─ OpenAI / Claude / Gemini / Local                   │
│       ↓                                                      │
│  🧹 [5] CORRECT                                             │
│       ├─ ResponseCorrector.correct()                        │
│       └─ Applies FORBID/MANDATE + redacts unauthorized      │
│       ↓                                                      │
│  🔍 [6] VERIFY                                              │
│       ├─ PostLLMVerifier.verify()                           │
│       ├─ NLI contradiction detection                        │
│       └─ Confidence: 0.30 → 0.85 (after auto-correction)    │
│       ↓                                                      │
│  📤 OUTPUT: "Solution in Java. Connection protected."       │
│                                                              │
└─────────────────────────────────────────────────────────────┘

🛡️ Unbreakable Security (Firewall & Verification)

Pre-Flight: A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
Post-Flight: NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.

Real‑World Use Cases

Use Case	NLProxy Benefit
Chat‑based customer support	Reduces token costs by 50% while preserving mandatory disclaimers and safety rules.
Code generation assistant	Masks API keys and internal IPs; enforces “do not use Python” restrictions.
Legal document analysis	Preserves confidentiality and privilege statements even after heavy compression.
Multi‑tenant SaaS	Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%.
On‑premise deployment	Works fully offline, no external dependencies (optional Redis for cache).

Components

Component	Function
Firewall	Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration).
Shield	Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE).
Segmenter	Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM).
Compressor	Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering.
Reconstructor	Re‑injects masked entities, removes stopwords, and computes token/cost savings.
SafetyChecker	Verifies critical intents/restrictions survive compression; re‑inserts missing sentences.
LLMOrchestrator	Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting.
PostLLMVerifier	NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring.
ResponseCorrector	Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders.
Semantic Cache	RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering.

Benchmark

Comparison with State‑of‑the‑Art (SOTA)

Solution	Injection Prevention	Entity Masking	Prompt Compression	Restriction Enforcement	Post‑LLM Verification	Offline	Open Source	Multi‑LLM
NLProxy	✅	✅	✅ (semantic)	✅	✅	✅	✅ (BSL 1.1)	✅
LangChain	❌ (no built‑in)	❌	❌ (only templates)	❌	❌	⚠️ partial	✅	✅
Semantic Kernel	❌	❌	❌	❌	❌	⚠️ partial	✅	✅
LLMLingua / Selective Context	❌	❌	✅ (token‑level)	❌	❌	✅	✅	❌
Rebuff (injection)	✅	❌	❌	❌	❌	⚠️	✅	❌
Lakera Guard	✅	✅ (basic)	❌	❌	❌	❌	❌	❌
Azure OpenAI Content Safety	✅	❌	❌	❌	❌	❌	❌	✅

Key differentiators:

NLProxy is the only open‑source solution that combines prompt security, semantic compression, constraint enforcement, and response verification in a single pipeline.
All critical components work offline (embedding & NLI models are downloaded once and run locally).
The business‑friendly BSL 1.1 license allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue).

Compression Efficiency

Metric	Value
Average token reduction (general)	45‑55%
Reduction on legal/finance documents	35‑45% (conservative)
Reduction on code prompts	55‑65%
Compression latency (per prompt)	50‑120 ms (CPU), 20‑40 ms (GPU)
Embedding model	all‑MiniLM‑L6‑v2 (384 dim, ONNX)
Clustering method	Auto‑select Ward (<200 sent) / K‑Means

Security & Verification

Check	Accuracy / Throughput
Injection detection (regex)	>99% on known patterns (MITRE ATLAS)
Semantic injection (embedding)	92% recall @ 0.85 threshold (optional)
Entity masking	100% of IPs, emails, dates, hashes
NLI contradiction detection	78‑85% accuracy (distilroberta‑base)
Restriction enforcement (FORBID)	100% (exact match)
Post‑LLM verification latency	+30‑60 ms per request (NLI enabled)

End‑to‑End Latency

Configuration	P95 Latency (ms)
Compression only (no NLI, no cache)	120‑180
Compression + Firewall + Shield	150‑220
Full pipeline + NLI verification	200‑300
Full pipeline + Semantic Cache (hit)	<10

Scalability

Component	Limit / Sizing Guideline
Max prompt length	100k chars (configurable)
Concurrent requests	Limited by `--workers` + thread pool (default 8)
Embedding batch size	128 sentences (can be increased with more memory)
Redis cache capacity	Unlimited (depends on Redis memory)
Multi‑LLM failover	Supports fallback chains (OpenAI → Claude → Gemini)

📄 License

NLProxy is released under the Business Source License 1.1 (BSL 1.1).

✅ Free for indie developers, students, non‑profits, and small businesses (revenue < $1M).
🏢 Large enterprises (revenue ≥ $1M) require a commercial license – contact us for pricing.
🔓 After five years from the release date, the code automatically converts to Apache 2.0.

See the LICENSE.md file for full text.

💬 Support & Contact

📧 Email: intellideeplabs@gmail.com
💬 Telegram: @itsLerb (click to open) – response within 24h
🐛 Issues: Use GitHub Issues for bugs and feature requests.

We welcome contributions, but please open an issue first to discuss.