NLProxy / README.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
14.1 kB
---
title: NLProxy Enterprise Demo
emoji: 🛡️
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: other
tags:
- llm
- prompt-compression
- security
- firewall
- nli
- pii-masking
- enterprise
---
<div align="center">
<h1>NLProxy</h1>
<p><strong>Prompt Security & Compression Gateway for LLMs</strong></p>
<p><em>The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.</em></p>
[![License](https://img.shields.io/badge/License-BSL--1.1-red)](https://github.com/intellideep/nlproxy/blob/main/LICENSE.md)
[![PyPI](https://img.shields.io/pypi/v/nlproxy)](https://pypi.org/project/nlproxy/)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/intellideep/nlproxy)
</div>
---
## 🎛️ About This Interactive Demo
This Hugging Face Space serves as a **live, interactive sandbox** for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time.
Upon startup, this Space dynamically clones the official [`intellideep/nlproxy`](https://github.com/intellideep/nlproxy) repository, downloads the required ONNX/NLI models, and exposes the complete **5-Step Lifecycle** via a Gradio interface.
---
## 📉 The Problem with LLMs Today
Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:
1. **Burning money** on redundant words, pleasantries, and verbose context.
2. **Leaking PII** (emails, IPs, internal code) to third-party servers.
3. **Exposing yourself** to jailbreaks, prompt injections, and semantic drift.
**NLProxy fixes all three before the prompt ever leaves your infrastructure.**
---
## 🎯 Why NLProxy?
### 💰 Slash Your LLM Bill (Semantic Compression)
NLProxy doesn't just strip stopwords. It uses **KMeans/Ward semantic clustering** and **ONNX-quantized embeddings** to understand the *meaning* of your prompt. It identifies redundant sentences and compresses them, **reducing token usage by 40% to 60%** without losing critical intent.
> *Result: A $1,000/month OpenAI bill becomes $400.*
### 🏗️ The 6-Stage Defense Pipeline (Visualized in this Demo)
```text
┌─────────────────────────────────────────────────────────────┐
│ NLProxy Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ 📥 INPUT: "Ignore instructions... IP 192.168.1.1..." │
│ ↓ │
│ 🛡️ [1] FIREWALL │
│ ├─ PromptFirewall.check_prompt() │
│ └─ Action: BLOCK / ALERT / REWRITE / ALLOW │
│ ↓ │
│ 📉 [2] COMPRESS │
│ ├─ CompressionService.compress_batch() │
│ ├─ Shield → Segment → Cluster → Reconstruct │
│ └─ Output: "IP: __PROT_xxx. Do NOT use Python..." │
│ ↓ │
│ 🔒 [3] SAFETY │
│ ├─ SafetyChecker.validate() │
│ └─ Reinserts critical intents if missing │
│ ↓ │
│ 🤖 [4] LLM CALL (Simulated in this demo) │
│ ├─ LLMOrchestrator.generate() │
│ └─ OpenAI / Claude / Gemini / Local │
│ ↓ │
│ 🧹 [5] CORRECT │
│ ├─ ResponseCorrector.correct() │
│ └─ Applies FORBID/MANDATE + redacts unauthorized │
│ ↓ │
│ 🔍 [6] VERIFY │
│ ├─ PostLLMVerifier.verify() │
│ ├─ NLI contradiction detection │
│ └─ Confidence: 0.30 → 0.85 (after auto-correction) │
│ ↓ │
│ 📤 OUTPUT: "Solution in Java. Connection protected." │
│ │
└─────────────────────────────────────────────────────────────┘
```
### 🛡️ Unbreakable Security (Firewall & Verification)
- **Pre-Flight:** A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
- **Post-Flight:** NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.
### Real‑World Use Cases
| Use Case | NLProxy Benefit |
|-----------------------------------|---------------------------------------------------------------------------------|
| **Chat‑based customer support** | Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. |
| **Code generation assistant** | Masks API keys and internal IPs; enforces “do not use Python” restrictions. |
| **Legal document analysis** | Preserves confidentiality and privilege statements even after heavy compression. |
| **Multi‑tenant SaaS** | Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%. |
| **On‑premise deployment** | Works fully offline, no external dependencies (optional Redis for cache). |
---
# Components
| Component | Function |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Firewall** | Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration). |
| **Shield** | Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE). |
| **Segmenter** | Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM). |
| **Compressor** | Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering. |
| **Reconstructor** | Re‑injects masked entities, removes stopwords, and computes token/cost savings. |
| **SafetyChecker** | Verifies critical intents/restrictions survive compression; re‑inserts missing sentences. |
| **LLMOrchestrator** | Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting. |
| **PostLLMVerifier** | NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring. |
| **ResponseCorrector** | Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders. |
| **Semantic Cache** | RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering. |
---
# Benchmark
## Comparison with State‑of‑the‑Art (SOTA)
| Solution | Injection Prevention | Entity Masking | Prompt Compression | Restriction Enforcement | Post‑LLM Verification | Offline | Open Source | Multi‑LLM |
|-------------------------|:--------------------:|:--------------:|:------------------:|:------------------------:|:---------------------:|:-------:|:-----------:|:---------:|
| **NLProxy** | ✅ | ✅ | ✅ (semantic) | ✅ | ✅ | ✅ | ✅ (BSL 1.1)| ✅ |
| LangChain | ❌ (no built‑in) | ❌ | ❌ (only templates) | ❌ | ❌ | ⚠️ partial | ✅ | ✅ |
| Semantic Kernel | ❌ | ❌ | ❌ | ❌ | ❌ | ⚠️ partial | ✅ | ✅ |
| LLMLingua / Selective Context | ❌ | ❌ | ✅ (token‑level) | ❌ | ❌ | ✅ | ✅ | ❌ |
| Rebuff (injection) | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ | ✅ | ❌ |
| Lakera Guard | ✅ | ✅ (basic) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Azure OpenAI Content Safety | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
**Key differentiators:**
- NLProxy is the **only open‑source solution** that combines **prompt security, semantic compression, constraint enforcement, and response verification** in a single pipeline.
- All critical components work **offline** (embedding & NLI models are downloaded once and run locally).
- The **business‑friendly BSL 1.1 license** allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue).
### Compression Efficiency
| Metric | Value |
|-------------------------------------|------------------------------------------|
| Average token reduction (general) | **45‑55%** |
| Reduction on legal/finance documents| 35‑45% (conservative) |
| Reduction on code prompts | 55‑65% |
| Compression latency (per prompt) | 50‑120 ms (CPU), 20‑40 ms (GPU) |
| Embedding model | all‑MiniLM‑L6‑v2 (384 dim, ONNX) |
| Clustering method | Auto‑select Ward (<200 sent) / K‑Means |
### Security & Verification
| Check | Accuracy / Throughput |
|------------------------------------|------------------------------------------|
| Injection detection (regex) | >99% on known patterns (MITRE ATLAS) |
| Semantic injection (embedding) | 92% recall @ 0.85 threshold (optional) |
| Entity masking | 100% of IPs, emails, dates, hashes |
| NLI contradiction detection | 78‑85% accuracy (distilroberta‑base) |
| Restriction enforcement (FORBID) | 100% (exact match) |
| Post‑LLM verification latency | +30‑60 ms per request (NLI enabled) |
### End‑to‑End Latency
| Configuration | P95 Latency (ms) |
|--------------------------------------|------------------|
| Compression only (no NLI, no cache) | 120‑180 |
| Compression + Firewall + Shield | 150‑220 |
| Full pipeline + NLI verification | 200‑300 |
| Full pipeline + Semantic Cache (hit) | <10 |
### Scalability
| Component | Limit / Sizing Guideline |
|--------------------------|-------------------------------------------------------|
| Max prompt length | 100k chars (configurable) |
| Concurrent requests | Limited by `--workers` + thread pool (default 8) |
| Embedding batch size | 128 sentences (can be increased with more memory) |
| Redis cache capacity | Unlimited (depends on Redis memory) |
| Multi‑LLM failover | Supports fallback chains (OpenAI → Claude → Gemini) |
---
## 📄 License
NLProxy is released under the **Business Source License 1.1** (BSL 1.1).
- ✅ Free for **indie developers, students, non‑profits, and small businesses** (revenue < $1M).
- 🏢 **Large enterprises** (revenue ≥ $1M) require a commercial license – contact us for pricing.
- 🔓 After **five years** from the release date, the code automatically converts to **Apache 2.0**.
See the [LICENSE.md](LICENSE.md) file for full text.
---
## 💬 Support & Contact
- 📧 Email: **intellideeplabs@gmail.com**
- 💬 Telegram: [@itsLerb](https://t.me/itsLerb) (click to open) – *response within 24h*
- 🐛 Issues: Use [GitHub Issues](https://github.com/intellideep/nlproxy/issues) for bugs and feature requests.
We welcome contributions, but please open an issue first to discuss.