Spaces:

IntelliDeep
/

NLProxy

Running

File size: 14,149 Bytes

---
title: NLProxy Enterprise Demo
emoji: 🛡️
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: other
tags:
  - llm
  - prompt-compression
  - security
  - firewall
  - nli
  - pii-masking
  - enterprise
---

<div align="center">
  <h1>NLProxy</h1>
  <p><strong>Prompt Security & Compression Gateway for LLMs</strong></p>
  <p><em>The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.</em></p>

  [![License](https://img.shields.io/badge/License-BSL--1.1-red)](https://github.com/intellideep/nlproxy/blob/main/LICENSE.md)
  [![PyPI](https://img.shields.io/pypi/v/nlproxy)](https://pypi.org/project/nlproxy/)
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/intellideep/nlproxy)
</div>

---

## 🎛️ About This Interactive Demo
This Hugging Face Space serves as a **live, interactive sandbox** for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time. 

Upon startup, this Space dynamically clones the official [`intellideep/nlproxy`](https://github.com/intellideep/nlproxy) repository, downloads the required ONNX/NLI models, and exposes the complete **5-Step Lifecycle** via a Gradio interface.

---

## 📉 The Problem with LLMs Today
Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:
1. **Burning money** on redundant words, pleasantries, and verbose context.
2. **Leaking PII** (emails, IPs, internal code) to third-party servers.
3. **Exposing yourself** to jailbreaks, prompt injections, and semantic drift.

**NLProxy fixes all three before the prompt ever leaves your infrastructure.**

---

## 🎯 Why NLProxy? 

### 💰 Slash Your LLM Bill (Semantic Compression)
NLProxy doesn't just strip stopwords. It uses **KMeans/Ward semantic clustering** and **ONNX-quantized embeddings** to understand the *meaning* of your prompt. It identifies redundant sentences and compresses them, **reducing token usage by 40% to 60%** without losing critical intent. 
> *Result: A $1,000/month OpenAI bill becomes $400.*


### 🏗️ The 6-Stage Defense Pipeline (Visualized in this Demo)

```text
┌─────────────────────────────────────────────────────────────┐
│                    NLProxy Pipeline                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  📥 INPUT: "Ignore instructions... IP 192.168.1.1..."       │
│       ↓                                                      │
│  🛡️ [1] FIREWALL                                            │
│       ├─ PromptFirewall.check_prompt()                      │
│       └─ Action: BLOCK / ALERT / REWRITE / ALLOW            │
│       ↓                                                      │
│  📉 [2] COMPRESS                                            │
│       ├─ CompressionService.compress_batch()                │
│       ├─ Shield → Segment → Cluster → Reconstruct           │
│       └─ Output: "IP: __PROT_xxx. Do NOT use Python..."     │
│       ↓                                                      │
│  🔒 [3] SAFETY                                              │
│       ├─ SafetyChecker.validate()                           │
│       └─ Reinserts critical intents if missing              │
│       ↓                                                      │
│  🤖 [4] LLM CALL (Simulated in this demo)                   │
│       ├─ LLMOrchestrator.generate()                         │
│       └─ OpenAI / Claude / Gemini / Local                   │
│       ↓                                                      │
│  🧹 [5] CORRECT                                             │
│       ├─ ResponseCorrector.correct()                        │
│       └─ Applies FORBID/MANDATE + redacts unauthorized      │
│       ↓                                                      │
│  🔍 [6] VERIFY                                              │
│       ├─ PostLLMVerifier.verify()                           │
│       ├─ NLI contradiction detection                        │
│       └─ Confidence: 0.30 → 0.85 (after auto-correction)    │
│       ↓                                                      │
│  📤 OUTPUT: "Solution in Java. Connection protected."       │
│                                                              │
└─────────────────────────────────────────────────────────────┘
```

### 🛡️ Unbreakable Security (Firewall & Verification)
- **Pre-Flight:** A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
- **Post-Flight:** NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.

### Real‑World Use Cases

| Use Case                          | NLProxy Benefit                                                                 |
|-----------------------------------|---------------------------------------------------------------------------------|
| **Chat‑based customer support**   | Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. |
| **Code generation assistant**     | Masks API keys and internal IPs; enforces “do not use Python” restrictions.      |
| **Legal document analysis**       | Preserves confidentiality and privilege statements even after heavy compression. |
| **Multi‑tenant SaaS**             | Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%.         |
| **On‑premise deployment**         | Works fully offline, no external dependencies (optional Redis for cache).        |


---

# Components

| Component               | Function                                                                                                                                                     |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Firewall**            | Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration).                                                               |
| **Shield**              | Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE).                                                           |
| **Segmenter**           | Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM).                                                                     |
| **Compressor**          | Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering.                                                                               |
| **Reconstructor**       | Re‑injects masked entities, removes stopwords, and computes token/cost savings.                                                                              |
| **SafetyChecker**       | Verifies critical intents/restrictions survive compression; re‑inserts missing sentences.                                                                    |
| **LLMOrchestrator**     | Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting.                                                                |
| **PostLLMVerifier**     | NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring.                                                                 |
| **ResponseCorrector**   | Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders.                                                                  |
| **Semantic Cache**      | RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering.                                                                         |

---


# Benchmark

## Comparison with State‑of‑the‑Art (SOTA)

| Solution                | Injection Prevention | Entity Masking | Prompt Compression | Restriction Enforcement | Post‑LLM Verification | Offline | Open Source | Multi‑LLM |
|-------------------------|:--------------------:|:--------------:|:------------------:|:------------------------:|:---------------------:|:-------:|:-----------:|:---------:|
| **NLProxy**             | ✅                   | ✅             | ✅ (semantic)       | ✅                       | ✅                    | ✅      | ✅ (BSL 1.1)| ✅        |
| LangChain               | ❌ (no built‑in)     | ❌             | ❌ (only templates) | ❌                       | ❌                    | ⚠️ partial | ✅        | ✅        |
| Semantic Kernel         | ❌                   | ❌             | ❌                  | ❌                       | ❌                    | ⚠️ partial | ✅        | ✅        |
| LLMLingua / Selective Context | ❌           | ❌             | ✅ (token‑level)    | ❌                       | ❌                    | ✅      | ✅        | ❌        |
| Rebuff (injection)      | ✅                   | ❌             | ❌                  | ❌                       | ❌                    | ⚠️      | ✅        | ❌        |
| Lakera Guard            | ✅                   | ✅ (basic)     | ❌                  | ❌                       | ❌                    | ❌       | ❌        | ❌        |
| Azure OpenAI Content Safety | ✅             | ❌             | ❌                  | ❌                       | ❌                    | ❌       | ❌        | ✅        |

**Key differentiators:**  
- NLProxy is the **only open‑source solution** that combines **prompt security, semantic compression, constraint enforcement, and response verification** in a single pipeline.  
- All critical components work **offline** (embedding & NLI models are downloaded once and run locally).  
- The **business‑friendly BSL 1.1 license** allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue).

### Compression Efficiency

| Metric                              | Value                                    |
|-------------------------------------|------------------------------------------|
| Average token reduction (general)   | **45‑55%**                               |
| Reduction on legal/finance documents| 35‑45% (conservative)                    |
| Reduction on code prompts           | 55‑65%                                   |
| Compression latency (per prompt)    | 50‑120 ms (CPU), 20‑40 ms (GPU)          |
| Embedding model                     | all‑MiniLM‑L6‑v2 (384 dim, ONNX)         |
| Clustering method                   | Auto‑select Ward (<200 sent) / K‑Means   |

### Security & Verification

| Check                              | Accuracy / Throughput                    |
|------------------------------------|------------------------------------------|
| Injection detection (regex)        | >99% on known patterns (MITRE ATLAS)     |
| Semantic injection (embedding)     | 92% recall @ 0.85 threshold (optional)   |
| Entity masking                     | 100% of IPs, emails, dates, hashes       |
| NLI contradiction detection        | 78‑85% accuracy (distilroberta‑base)     |
| Restriction enforcement (FORBID)   | 100% (exact match)                       |
| Post‑LLM verification latency      | +30‑60 ms per request (NLI enabled)      |

### End‑to‑End Latency

| Configuration                        | P95 Latency (ms) |
|--------------------------------------|------------------|
| Compression only (no NLI, no cache)  | 120‑180          |
| Compression + Firewall + Shield      | 150‑220          |
| Full pipeline + NLI verification     | 200‑300          |
| Full pipeline + Semantic Cache (hit) | <10              |

### Scalability

| Component                | Limit / Sizing Guideline                              |
|--------------------------|-------------------------------------------------------|
| Max prompt length        | 100k chars (configurable)                             |
| Concurrent requests      | Limited by `--workers` + thread pool (default 8)     |
| Embedding batch size     | 128 sentences (can be increased with more memory)    |
| Redis cache capacity     | Unlimited (depends on Redis memory)                  |
| Multi‑LLM failover        | Supports fallback chains (OpenAI → Claude → Gemini)  |


---

## 📄 License

NLProxy is released under the **Business Source License 1.1** (BSL 1.1).  
- ✅ Free for **indie developers, students, non‑profits, and small businesses** (revenue < $1M).  
- 🏢 **Large enterprises** (revenue ≥ $1M) require a commercial license – contact us for pricing.  
- 🔓 After **five years** from the release date, the code automatically converts to **Apache 2.0**.

See the [LICENSE.md](LICENSE.md) file for full text.

---

## 💬 Support & Contact

- 📧 Email: **intellideeplabs@gmail.com**  
- 💬 Telegram: [@itsLerb](https://t.me/itsLerb) (click to open) – *response within 24h*  
- 🐛 Issues: Use [GitHub Issues](https://github.com/intellideep/nlproxy/issues) for bugs and feature requests.

We welcome contributions, but please open an issue first to discuss.