--- title: NLProxy Enterprise Demo emoji: πŸ›‘οΈ colorFrom: blue colorTo: gray sdk: gradio sdk_version: 4.36.1 app_file: app.py pinned: false license: other tags: - llm - prompt-compression - security - firewall - nli - pii-masking - enterprise ---

NLProxy

Prompt Security & Compression Gateway for LLMs

The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.

[![License](https://img.shields.io/badge/License-BSL--1.1-red)](https://github.com/intellideep/nlproxy/blob/main/LICENSE.md) [![PyPI](https://img.shields.io/pypi/v/nlproxy)](https://pypi.org/project/nlproxy/) [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/intellideep/nlproxy)
--- ## πŸŽ›οΈ About This Interactive Demo This Hugging Face Space serves as a **live, interactive sandbox** for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time. Upon startup, this Space dynamically clones the official [`intellideep/nlproxy`](https://github.com/intellideep/nlproxy) repository, downloads the required ONNX/NLI models, and exposes the complete **5-Step Lifecycle** via a Gradio interface. --- ## πŸ“‰ The Problem with LLMs Today Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things: 1. **Burning money** on redundant words, pleasantries, and verbose context. 2. **Leaking PII** (emails, IPs, internal code) to third-party servers. 3. **Exposing yourself** to jailbreaks, prompt injections, and semantic drift. **NLProxy fixes all three before the prompt ever leaves your infrastructure.** --- ## 🎯 Why NLProxy? ### πŸ’° Slash Your LLM Bill (Semantic Compression) NLProxy doesn't just strip stopwords. It uses **KMeans/Ward semantic clustering** and **ONNX-quantized embeddings** to understand the *meaning* of your prompt. It identifies redundant sentences and compresses them, **reducing token usage by 40% to 60%** without losing critical intent. > *Result: A $1,000/month OpenAI bill becomes $400.* ### πŸ—οΈ The 6-Stage Defense Pipeline (Visualized in this Demo) ```text β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ NLProxy Pipeline β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ πŸ“₯ INPUT: "Ignore instructions... IP 192.168.1.1..." β”‚ β”‚ ↓ β”‚ β”‚ πŸ›‘οΈ [1] FIREWALL β”‚ β”‚ β”œβ”€ PromptFirewall.check_prompt() β”‚ β”‚ └─ Action: BLOCK / ALERT / REWRITE / ALLOW β”‚ β”‚ ↓ β”‚ β”‚ πŸ“‰ [2] COMPRESS β”‚ β”‚ β”œβ”€ CompressionService.compress_batch() β”‚ β”‚ β”œβ”€ Shield β†’ Segment β†’ Cluster β†’ Reconstruct β”‚ β”‚ └─ Output: "IP: __PROT_xxx. Do NOT use Python..." β”‚ β”‚ ↓ β”‚ β”‚ πŸ”’ [3] SAFETY β”‚ β”‚ β”œβ”€ SafetyChecker.validate() β”‚ β”‚ └─ Reinserts critical intents if missing β”‚ β”‚ ↓ β”‚ β”‚ πŸ€– [4] LLM CALL (Simulated in this demo) β”‚ β”‚ β”œβ”€ LLMOrchestrator.generate() β”‚ β”‚ └─ OpenAI / Claude / Gemini / Local β”‚ β”‚ ↓ β”‚ β”‚ 🧹 [5] CORRECT β”‚ β”‚ β”œβ”€ ResponseCorrector.correct() β”‚ β”‚ └─ Applies FORBID/MANDATE + redacts unauthorized β”‚ β”‚ ↓ β”‚ β”‚ πŸ” [6] VERIFY β”‚ β”‚ β”œβ”€ PostLLMVerifier.verify() β”‚ β”‚ β”œβ”€ NLI contradiction detection β”‚ β”‚ └─ Confidence: 0.30 β†’ 0.85 (after auto-correction) β”‚ β”‚ ↓ β”‚ β”‚ πŸ“€ OUTPUT: "Solution in Java. Connection protected." β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### πŸ›‘οΈ Unbreakable Security (Firewall & Verification) - **Pre-Flight:** A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection. - **Post-Flight:** NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities. ### Real‑World Use Cases | Use Case | NLProxy Benefit | |-----------------------------------|---------------------------------------------------------------------------------| | **Chat‑based customer support** | Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. | | **Code generation assistant** | Masks API keys and internal IPs; enforces β€œdo not use Python” restrictions. | | **Legal document analysis** | Preserves confidentiality and privilege statements even after heavy compression. | | **Multi‑tenant SaaS** | Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%. | | **On‑premise deployment** | Works fully offline, no external dependencies (optional Redis for cache). | --- # Components | Component | Function | |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------| | **Firewall** | Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration). | | **Shield** | Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE). | | **Segmenter** | Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM). | | **Compressor** | Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering. | | **Reconstructor** | Re‑injects masked entities, removes stopwords, and computes token/cost savings. | | **SafetyChecker** | Verifies critical intents/restrictions survive compression; re‑inserts missing sentences. | | **LLMOrchestrator** | Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting. | | **PostLLMVerifier** | NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring. | | **ResponseCorrector** | Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders. | | **Semantic Cache** | RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering. | --- # Benchmark ## Comparison with State‑of‑the‑Art (SOTA) | Solution | Injection Prevention | Entity Masking | Prompt Compression | Restriction Enforcement | Post‑LLM Verification | Offline | Open Source | Multi‑LLM | |-------------------------|:--------------------:|:--------------:|:------------------:|:------------------------:|:---------------------:|:-------:|:-----------:|:---------:| | **NLProxy** | βœ… | βœ… | βœ… (semantic) | βœ… | βœ… | βœ… | βœ… (BSL 1.1)| βœ… | | LangChain | ❌ (no built‑in) | ❌ | ❌ (only templates) | ❌ | ❌ | ⚠️ partial | βœ… | βœ… | | Semantic Kernel | ❌ | ❌ | ❌ | ❌ | ❌ | ⚠️ partial | βœ… | βœ… | | LLMLingua / Selective Context | ❌ | ❌ | βœ… (token‑level) | ❌ | ❌ | βœ… | βœ… | ❌ | | Rebuff (injection) | βœ… | ❌ | ❌ | ❌ | ❌ | ⚠️ | βœ… | ❌ | | Lakera Guard | βœ… | βœ… (basic) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | Azure OpenAI Content Safety | βœ… | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | βœ… | **Key differentiators:** - NLProxy is the **only open‑source solution** that combines **prompt security, semantic compression, constraint enforcement, and response verification** in a single pipeline. - All critical components work **offline** (embedding & NLI models are downloaded once and run locally). - The **business‑friendly BSL 1.1 license** allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue). ### Compression Efficiency | Metric | Value | |-------------------------------------|------------------------------------------| | Average token reduction (general) | **45‑55%** | | Reduction on legal/finance documents| 35‑45% (conservative) | | Reduction on code prompts | 55‑65% | | Compression latency (per prompt) | 50‑120 ms (CPU), 20‑40 ms (GPU) | | Embedding model | all‑MiniLM‑L6‑v2 (384 dim, ONNX) | | Clustering method | Auto‑select Ward (<200 sent) / K‑Means | ### Security & Verification | Check | Accuracy / Throughput | |------------------------------------|------------------------------------------| | Injection detection (regex) | >99% on known patterns (MITRE ATLAS) | | Semantic injection (embedding) | 92% recall @ 0.85 threshold (optional) | | Entity masking | 100% of IPs, emails, dates, hashes | | NLI contradiction detection | 78‑85% accuracy (distilroberta‑base) | | Restriction enforcement (FORBID) | 100% (exact match) | | Post‑LLM verification latency | +30‑60 ms per request (NLI enabled) | ### End‑to‑End Latency | Configuration | P95 Latency (ms) | |--------------------------------------|------------------| | Compression only (no NLI, no cache) | 120‑180 | | Compression + Firewall + Shield | 150‑220 | | Full pipeline + NLI verification | 200‑300 | | Full pipeline + Semantic Cache (hit) | <10 | ### Scalability | Component | Limit / Sizing Guideline | |--------------------------|-------------------------------------------------------| | Max prompt length | 100k chars (configurable) | | Concurrent requests | Limited by `--workers` + thread pool (default 8) | | Embedding batch size | 128 sentences (can be increased with more memory) | | Redis cache capacity | Unlimited (depends on Redis memory) | | Multi‑LLM failover | Supports fallback chains (OpenAI β†’ Claude β†’ Gemini) | --- ## πŸ“„ License NLProxy is released under the **Business Source License 1.1** (BSL 1.1). - βœ… Free for **indie developers, students, non‑profits, and small businesses** (revenue < $1M). - 🏒 **Large enterprises** (revenue β‰₯ $1M) require a commercial license – contact us for pricing. - πŸ”“ After **five years** from the release date, the code automatically converts to **Apache 2.0**. See the [LICENSE.md](LICENSE.md) file for full text. --- ## πŸ’¬ Support & Contact - πŸ“§ Email: **intellideeplabs@gmail.com** - πŸ’¬ Telegram: [@itsLerb](https://t.me/itsLerb) (click to open) – *response within 24h* - πŸ› Issues: Use [GitHub Issues](https://github.com/intellideep/nlproxy/issues) for bugs and feature requests. We welcome contributions, but please open an issue first to discuss.