Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / README.md

Luiserb

first commit

2129c29 17 days ago

preview code

Raw

History Blame Contribute Delete

14.1 kB

	---
	title: NLProxy Enterprise Demo
	emoji: 🛡️
	colorFrom: blue
	colorTo: gray
	sdk: gradio
	sdk_version: 4.36.1
	app_file: app.py
	pinned: false
	license: other
	tags:
	- llm
	- prompt-compression
	- security
	- firewall
	- nli
	- pii-masking
	- enterprise
	---

	<div align="center">
	<h1>NLProxy</h1>
	<p><strong>Prompt Security & Compression Gateway for LLMs</strong></p>
	<p><em>The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.</em></p>

	[![License](https://img.shields.io/badge/License-BSL--1.1-red)](https://github.com/intellideep/nlproxy/blob/main/LICENSE.md)
	[![PyPI](https://img.shields.io/pypi/v/nlproxy)](https://pypi.org/project/nlproxy/)
	[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/intellideep/nlproxy)
	</div>

	---

	## 🎛️ About This Interactive Demo
	This Hugging Face Space serves as a live, interactive sandbox for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time.

	Upon startup, this Space dynamically clones the official [`intellideep/nlproxy`](https://github.com/intellideep/nlproxy) repository, downloads the required ONNX/NLI models, and exposes the complete 5-Step Lifecycle via a Gradio interface.

	---

	## 📉 The Problem with LLMs Today
	Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:
	1. Burning money on redundant words, pleasantries, and verbose context.
	2. Leaking PII (emails, IPs, internal code) to third-party servers.
	3. Exposing yourself to jailbreaks, prompt injections, and semantic drift.

	NLProxy fixes all three before the prompt ever leaves your infrastructure.

	---

	## 🎯 Why NLProxy?

	### 💰 Slash Your LLM Bill (Semantic Compression)
	NLProxy doesn't just strip stopwords. It uses KMeans/Ward semantic clustering and ONNX-quantized embeddings to understand the meaning of your prompt. It identifies redundant sentences and compresses them, reducing token usage by 40% to 60% without losing critical intent.
	> Result: A $1,000/month OpenAI bill becomes $400.


	### 🏗️ The 6-Stage Defense Pipeline (Visualized in this Demo)

	```text
	┌─────────────────────────────────────────────────────────────┐
	│ NLProxy Pipeline │
	├─────────────────────────────────────────────────────────────┤
	│ │
	│ 📥 INPUT: "Ignore instructions... IP 192.168.1.1..." │
	│ ↓ │
	│ 🛡️ [1] FIREWALL │
	│ ├─ PromptFirewall.check_prompt() │
	│ └─ Action: BLOCK / ALERT / REWRITE / ALLOW │
	│ ↓ │
	│ 📉 [2] COMPRESS │
	│ ├─ CompressionService.compress_batch() │
	│ ├─ Shield → Segment → Cluster → Reconstruct │
	│ └─ Output: "IP: __PROT_xxx. Do NOT use Python..." │
	│ ↓ │
	│ 🔒 [3] SAFETY │
	│ ├─ SafetyChecker.validate() │
	│ └─ Reinserts critical intents if missing │
	│ ↓ │
	│ 🤖 [4] LLM CALL (Simulated in this demo) │
	│ ├─ LLMOrchestrator.generate() │
	│ └─ OpenAI / Claude / Gemini / Local │
	│ ↓ │
	│ 🧹 [5] CORRECT │
	│ ├─ ResponseCorrector.correct() │
	│ └─ Applies FORBID/MANDATE + redacts unauthorized │
	│ ↓ │
	│ 🔍 [6] VERIFY │
	│ ├─ PostLLMVerifier.verify() │
	│ ├─ NLI contradiction detection │
	│ └─ Confidence: 0.30 → 0.85 (after auto-correction) │
	│ ↓ │
	│ 📤 OUTPUT: "Solution in Java. Connection protected." │
	│ │
	└─────────────────────────────────────────────────────────────┘
	```

	### 🛡️ Unbreakable Security (Firewall & Verification)
	- Pre-Flight: A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
	- Post-Flight: NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.

	### Real‑World Use Cases

	\| Use Case \| NLProxy Benefit \|
	\|-----------------------------------\|---------------------------------------------------------------------------------\|
	\| Chat‑based customer support \| Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. \|
	\| Code generation assistant \| Masks API keys and internal IPs; enforces “do not use Python” restrictions. \|
	\| Legal document analysis \| Preserves confidentiality and privilege statements even after heavy compression. \|
	\| Multi‑tenant SaaS \| Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%. \|
	\| On‑premise deployment \| Works fully offline, no external dependencies (optional Redis for cache). \|


	---

	# Components

	\| Component \| Function \|
	\|-------------------------\|--------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| Firewall \| Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration). \|
	\| Shield \| Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE). \|
	\| Segmenter \| Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM). \|
	\| Compressor \| Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering. \|
	\| Reconstructor \| Re‑injects masked entities, removes stopwords, and computes token/cost savings. \|
	\| SafetyChecker \| Verifies critical intents/restrictions survive compression; re‑inserts missing sentences. \|
	\| LLMOrchestrator \| Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting. \|
	\| PostLLMVerifier \| NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring. \|
	\| ResponseCorrector \| Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders. \|
	\| Semantic Cache \| RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering. \|

	---


	# Benchmark

	## Comparison with State‑of‑the‑Art (SOTA)

	\| Solution \| Injection Prevention \| Entity Masking \| Prompt Compression \| Restriction Enforcement \| Post‑LLM Verification \| Offline \| Open Source \| Multi‑LLM \|
	\|-------------------------\|:--------------------:\|:--------------:\|:------------------:\|:------------------------:\|:---------------------:\|:-------:\|:-----------:\|:---------:\|
	\| NLProxy \| ✅ \| ✅ \| ✅ (semantic) \| ✅ \| ✅ \| ✅ \| ✅ (BSL 1.1)\| ✅ \|
	\| LangChain \| ❌ (no built‑in) \| ❌ \| ❌ (only templates) \| ❌ \| ❌ \| ⚠️ partial \| ✅ \| ✅ \|
	\| Semantic Kernel \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \| ⚠️ partial \| ✅ \| ✅ \|
	\| LLMLingua / Selective Context \| ❌ \| ❌ \| ✅ (token‑level) \| ❌ \| ❌ \| ✅ \| ✅ \| ❌ \|
	\| Rebuff (injection) \| ✅ \| ❌ \| ❌ \| ❌ \| ❌ \| ⚠️ \| ✅ \| ❌ \|
	\| Lakera Guard \| ✅ \| ✅ (basic) \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \|
	\| Azure OpenAI Content Safety \| ✅ \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \| ✅ \|

	Key differentiators:
	- NLProxy is the only open‑source solution that combines prompt security, semantic compression, constraint enforcement, and response verification in a single pipeline.
	- All critical components work offline (embedding & NLI models are downloaded once and run locally).
	- The business‑friendly BSL 1.1 license allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue).

	### Compression Efficiency

	\| Metric \| Value \|
	\|-------------------------------------\|------------------------------------------\|
	\| Average token reduction (general) \| 45‑55% \|
	\| Reduction on legal/finance documents\| 35‑45% (conservative) \|
	\| Reduction on code prompts \| 55‑65% \|
	\| Compression latency (per prompt) \| 50‑120 ms (CPU), 20‑40 ms (GPU) \|
	\| Embedding model \| all‑MiniLM‑L6‑v2 (384 dim, ONNX) \|
	\| Clustering method \| Auto‑select Ward (<200 sent) / K‑Means \|

	### Security & Verification

	\| Check \| Accuracy / Throughput \|
	\|------------------------------------\|------------------------------------------\|
	\| Injection detection (regex) \| >99% on known patterns (MITRE ATLAS) \|
	\| Semantic injection (embedding) \| 92% recall @ 0.85 threshold (optional) \|
	\| Entity masking \| 100% of IPs, emails, dates, hashes \|
	\| NLI contradiction detection \| 78‑85% accuracy (distilroberta‑base) \|
	\| Restriction enforcement (FORBID) \| 100% (exact match) \|
	\| Post‑LLM verification latency \| +30‑60 ms per request (NLI enabled) \|

	### End‑to‑End Latency

	\| Configuration \| P95 Latency (ms) \|
	\|--------------------------------------\|------------------\|
	\| Compression only (no NLI, no cache) \| 120‑180 \|
	\| Compression + Firewall + Shield \| 150‑220 \|
	\| Full pipeline + NLI verification \| 200‑300 \|
	\| Full pipeline + Semantic Cache (hit) \| <10 \|

	### Scalability

	\| Component \| Limit / Sizing Guideline \|
	\|--------------------------\|-------------------------------------------------------\|
	\| Max prompt length \| 100k chars (configurable) \|
	\| Concurrent requests \| Limited by `--workers` + thread pool (default 8) \|
	\| Embedding batch size \| 128 sentences (can be increased with more memory) \|
	\| Redis cache capacity \| Unlimited (depends on Redis memory) \|
	\| Multi‑LLM failover \| Supports fallback chains (OpenAI → Claude → Gemini) \|


	---

	## 📄 License

	NLProxy is released under the Business Source License 1.1 (BSL 1.1).
	- ✅ Free for indie developers, students, non‑profits, and small businesses (revenue < $1M).
	- 🏢 Large enterprises (revenue ≥ $1M) require a commercial license – contact us for pricing.
	- 🔓 After five years from the release date, the code automatically converts to Apache 2.0.

	See the [LICENSE.md](LICENSE.md) file for full text.

	---

	## 💬 Support & Contact

	- 📧 Email: intellideeplabs@gmail.com
	- 💬 Telegram: [@itsLerb](https://t.me/itsLerb) (click to open) – response within 24h
	- 🐛 Issues: Use [GitHub Issues](https://github.com/intellideep/nlproxy/issues) for bugs and feature requests.

	We welcome contributions, but please open an issue first to discuss.