--- title: NLProxy Enterprise Demo emoji: 🛡️ colorFrom: blue colorTo: gray sdk: gradio sdk_version: 4.36.1 app_file: app.py pinned: false license: other tags: - llm - prompt-compression - security - firewall - nli - pii-masking - enterprise ---

NLProxy

Prompt Security & Compression Gateway for LLMs

The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.

[![License](https://img.shields.io/badge/License-BSL--1.1-red)](https://github.com/intellideep/nlproxy/blob/main/LICENSE.md) [![PyPI](https://img.shields.io/pypi/v/nlproxy)](https://pypi.org/project/nlproxy/) [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/intellideep/nlproxy)

--- ## 🎛️ About This Interactive Demo This Hugging Face Space serves as a **live, interactive sandbox** for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time. Upon startup, this Space dynamically clones the official [`intellideep/nlproxy`](https://github.com/intellideep/nlproxy) repository, downloads the required ONNX/NLI models, and exposes the complete **5-Step Lifecycle** via a Gradio interface. --- ## 📉 The Problem with LLMs Today Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things: 1. **Burning money** on redundant words, pleasantries, and verbose context. 2. **Leaking PII** (emails, IPs, internal code) to third-party servers. 3. **Exposing yourself** to jailbreaks, prompt injections, and semantic drift. **NLProxy fixes all three before the prompt ever leaves your infrastructure.** --- ## 🎯 Why NLProxy? ### 💰 Slash Your LLM Bill (Semantic Compression) NLProxy doesn't just strip stopwords. It uses **KMeans/Ward semantic clustering** and **ONNX-quantized embeddings** to understand the *meaning* of your prompt. It identifies redundant sentences and compresses them, **reducing token usage by 40% to 60%** without losing critical intent. > *Result: A $1,000/month OpenAI bill becomes $400.* ### 🏗️ The 6-Stage Defense Pipeline (Visualized in this Demo) ```text ┌─────────────────────────────────────────────────────────────┐ │ NLProxy Pipeline │ ├─────────────────────────────────────────────────────────────┤ │ │ │ 📥 INPUT: "Ignore instructions... IP 192.168.1.1..." │ │ ↓ │ │ 🛡️ [1] FIREWALL │ │ ├─ PromptFirewall.check_prompt() │ │ └─ Action: BLOCK / ALERT / REWRITE / ALLOW │ │ ↓ │ │ 📉 [2] COMPRESS │ │ ├─ CompressionService.compress_batch() │ │ ├─ Shield → Segment → Cluster → Reconstruct │ │ └─ Output: "IP: __PROT_xxx. Do NOT use Python..." │ │ ↓ │ │ 🔒 [3] SAFETY │ │ ├─ SafetyChecker.validate() │ │ └─ Reinserts critical intents if missing │ │ ↓ │ │ 🤖 [4] LLM CALL (Simulated in this demo) │ │ ├─ LLMOrchestrator.generate() │ │ └─ OpenAI / Claude / Gemini / Local │ │ ↓ │ │ 🧹 [5] CORRECT │ │ ├─ ResponseCorrector.correct() │ │ └─ Applies FORBID/MANDATE + redacts unauthorized │ │ ↓ │ │ 🔍 [6] VERIFY │ │ ├─ PostLLMVerifier.verify() │ │ ├─ NLI contradiction detection │ │ └─ Confidence: 0.30 → 0.85 (after auto-correction) │ │ ↓ │ │ 📤 OUTPUT: "Solution in Java. Connection protected." │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ### 🛡️ Unbreakable Security (Firewall & Verification) - **Pre-Flight:** A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection. - **Post-Flight:** NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities. ### Real‑World Use Cases | Use Case | NLProxy Benefit | |-----------------------------------|---------------------------------------------------------------------------------| | **Chat‑based customer support** | Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. | | **Code generation assistant** | Masks API keys and internal IPs; enforces “do not use Python” restrictions. | | **Legal document analysis** | Preserves confidentiality and privilege statements even after heavy compression. | | **Multi‑tenant SaaS** | Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%. | | **On‑premise deployment** | Works fully offline, no external dependencies (optional Redis for cache). | --- # Components | Component | Function | |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------| | **Firewall** | Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration). | | **Shield** | Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE). | | **Segmenter** | Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM). | | **Compressor** | Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering. | | **Reconstructor** | Re‑injects masked entities, removes stopwords, and computes token/cost savings. | | **SafetyChecker** | Verifies critical intents/restrictions survive compression; re‑inserts missing sentences. | | **LLMOrchestrator** | Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting. | | **PostLLMVerifier** | NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring. | | **ResponseCorrector** | Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders. | | **Semantic Cache** | RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering. | --- # Benchmark ## Comparison with State‑of‑the‑Art (SOTA) | Solution | Injection Prevention | Entity Masking | Prompt Compression | Restriction Enforcement | Post‑LLM Verification | Offline | Open Source | Multi‑LLM | |-------------------------|:--------------------:|:--------------:|:------------------:|:------------------------:|:---------------------:|:-------:|:-----------:|:---------:| | **NLProxy** | ✅ | ✅ | ✅ (semantic) | ✅ | ✅ | ✅ | ✅ (BSL 1.1)| ✅ | | LangChain | ❌ (no built‑in) | ❌ | ❌ (only templates) | ❌ | ❌ | ⚠️ partial | ✅ | ✅ | | Semantic Kernel | ❌ | ❌ | ❌ | ❌ | ❌ | ⚠️ partial | ✅ | ✅ | | LLMLingua / Selective Context | ❌ | ❌ | ✅ (token‑level) | ❌ | ❌ | ✅ | ✅ | ❌ | | Rebuff (injection) | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ | ✅ | ❌ | | Lakera Guard | ✅ | ✅ (basic) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | Azure OpenAI Content Safety | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | **Key differentiators:** - NLProxy is the **only open‑source solution** that combines **prompt security, semantic compression, constraint enforcement, and response verification** in a single pipeline. - All critical components work **offline** (embedding & NLI models are downloaded once and run locally). - The **business‑friendly BSL 1.1 license** allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue). ### Compression Efficiency | Metric | Value | |-------------------------------------|------------------------------------------| | Average token reduction (general) | **45‑55%** | | Reduction on legal/finance documents| 35‑45% (conservative) | | Reduction on code prompts | 55‑65% | | Compression latency (per prompt) | 50‑120 ms (CPU), 20‑40 ms (GPU) | | Embedding model | all‑MiniLM‑L6‑v2 (384 dim, ONNX) | | Clustering method | Auto‑select Ward (<200 sent) / K‑Means | ### Security & Verification | Check | Accuracy / Throughput | |------------------------------------|------------------------------------------| | Injection detection (regex) | >99% on known patterns (MITRE ATLAS) | | Semantic injection (embedding) | 92% recall @ 0.85 threshold (optional) | | Entity masking | 100% of IPs, emails, dates, hashes | | NLI contradiction detection | 78‑85% accuracy (distilroberta‑base) | | Restriction enforcement (FORBID) | 100% (exact match) | | Post‑LLM verification latency | +30‑60 ms per request (NLI enabled) | ### End‑to‑End Latency | Configuration | P95 Latency (ms) | |--------------------------------------|------------------| | Compression only (no NLI, no cache) | 120‑180 | | Compression + Firewall + Shield | 150‑220 | | Full pipeline + NLI verification | 200‑300 | | Full pipeline + Semantic Cache (hit) | <10 | ### Scalability | Component | Limit / Sizing Guideline | |--------------------------|-------------------------------------------------------| | Max prompt length | 100k chars (configurable) | | Concurrent requests | Limited by `--workers` + thread pool (default 8) | | Embedding batch size | 128 sentences (can be increased with more memory) | | Redis cache capacity | Unlimited (depends on Redis memory) | | Multi‑LLM failover | Supports fallback chains (OpenAI → Claude → Gemini) | --- ## 📄 License NLProxy is released under the **Business Source License 1.1** (BSL 1.1). - ✅ Free for **indie developers, students, non‑profits, and small businesses** (revenue < $1M). - 🏢 **Large enterprises** (revenue ≥ $1M) require a commercial license – contact us for pricing. - 🔓 After **five years** from the release date, the code automatically converts to **Apache 2.0**. See the [LICENSE.md](LICENSE.md) file for full text. --- ## 💬 Support & Contact - 📧 Email: **intellideeplabs@gmail.com** - 💬 Telegram: [@itsLerb](https://t.me/itsLerb) (click to open) – *response within 24h* - 🐛 Issues: Use [GitHub Issues](https://github.com/intellideep/nlproxy/issues) for bugs and feature requests. We welcome contributions, but please open an issue first to discuss.