File size: 14,149 Bytes
c173619
2129c29
 
 
 
c173619
2129c29
c173619
 
 
2129c29
 
 
 
 
 
 
 
c173619
 
2129c29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
title: NLProxy Enterprise Demo
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: other
tags:
  - llm
  - prompt-compression
  - security
  - firewall
  - nli
  - pii-masking
  - enterprise
---

<div align="center">
  <h1>NLProxy</h1>
  <p><strong>Prompt Security & Compression Gateway for LLMs</strong></p>
  <p><em>The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.</em></p>

  [![License](https://img.shields.io/badge/License-BSL--1.1-red)](https://github.com/intellideep/nlproxy/blob/main/LICENSE.md)
  [![PyPI](https://img.shields.io/pypi/v/nlproxy)](https://pypi.org/project/nlproxy/)
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/intellideep/nlproxy)
</div>

---

## πŸŽ›οΈ About This Interactive Demo
This Hugging Face Space serves as a **live, interactive sandbox** for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time. 

Upon startup, this Space dynamically clones the official [`intellideep/nlproxy`](https://github.com/intellideep/nlproxy) repository, downloads the required ONNX/NLI models, and exposes the complete **5-Step Lifecycle** via a Gradio interface.

---

## πŸ“‰ The Problem with LLMs Today
Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:
1. **Burning money** on redundant words, pleasantries, and verbose context.
2. **Leaking PII** (emails, IPs, internal code) to third-party servers.
3. **Exposing yourself** to jailbreaks, prompt injections, and semantic drift.

**NLProxy fixes all three before the prompt ever leaves your infrastructure.**

---

## 🎯 Why NLProxy? 

### πŸ’° Slash Your LLM Bill (Semantic Compression)
NLProxy doesn't just strip stopwords. It uses **KMeans/Ward semantic clustering** and **ONNX-quantized embeddings** to understand the *meaning* of your prompt. It identifies redundant sentences and compresses them, **reducing token usage by 40% to 60%** without losing critical intent. 
> *Result: A $1,000/month OpenAI bill becomes $400.*


### πŸ—οΈ The 6-Stage Defense Pipeline (Visualized in this Demo)

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    NLProxy Pipeline                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚  πŸ“₯ INPUT: "Ignore instructions... IP 192.168.1.1..."       β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ›‘οΈ [1] FIREWALL                                            β”‚
β”‚       β”œβ”€ PromptFirewall.check_prompt()                      β”‚
β”‚       └─ Action: BLOCK / ALERT / REWRITE / ALLOW            β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ“‰ [2] COMPRESS                                            β”‚
β”‚       β”œβ”€ CompressionService.compress_batch()                β”‚
β”‚       β”œβ”€ Shield β†’ Segment β†’ Cluster β†’ Reconstruct           β”‚
β”‚       └─ Output: "IP: __PROT_xxx. Do NOT use Python..."     β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ”’ [3] SAFETY                                              β”‚
β”‚       β”œβ”€ SafetyChecker.validate()                           β”‚
β”‚       └─ Reinserts critical intents if missing              β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ€– [4] LLM CALL (Simulated in this demo)                   β”‚
β”‚       β”œβ”€ LLMOrchestrator.generate()                         β”‚
β”‚       └─ OpenAI / Claude / Gemini / Local                   β”‚
β”‚       ↓                                                      β”‚
β”‚  🧹 [5] CORRECT                                             β”‚
β”‚       β”œβ”€ ResponseCorrector.correct()                        β”‚
β”‚       └─ Applies FORBID/MANDATE + redacts unauthorized      β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ” [6] VERIFY                                              β”‚
β”‚       β”œβ”€ PostLLMVerifier.verify()                           β”‚
β”‚       β”œβ”€ NLI contradiction detection                        β”‚
β”‚       └─ Confidence: 0.30 β†’ 0.85 (after auto-correction)    β”‚
β”‚       ↓                                                      β”‚
β”‚  πŸ“€ OUTPUT: "Solution in Java. Connection protected."       β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### πŸ›‘οΈ Unbreakable Security (Firewall & Verification)
- **Pre-Flight:** A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
- **Post-Flight:** NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.

### Real‑World Use Cases

| Use Case                          | NLProxy Benefit                                                                 |
|-----------------------------------|---------------------------------------------------------------------------------|
| **Chat‑based customer support**   | Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. |
| **Code generation assistant**     | Masks API keys and internal IPs; enforces β€œdo not use Python” restrictions.      |
| **Legal document analysis**       | Preserves confidentiality and privilege statements even after heavy compression. |
| **Multi‑tenant SaaS**             | Semantic cache + domain filtering reduces redundant LLM calls by 70‑80%.         |
| **On‑premise deployment**         | Works fully offline, no external dependencies (optional Redis for cache).        |


---

# Components

| Component               | Function                                                                                                                                                     |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Firewall**            | Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration).                                                               |
| **Shield**              | Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE).                                                           |
| **Segmenter**           | Language‑aware sentence splitting + ONNX‑accelerated sentence embeddings (384‑d MiniLM).                                                                     |
| **Compressor**          | Clustering‑based redundancy removal (Ward / K‑Means) with variance filtering.                                                                               |
| **Reconstructor**       | Re‑injects masked entities, removes stopwords, and computes token/cost savings.                                                                              |
| **SafetyChecker**       | Verifies critical intents/restrictions survive compression; re‑inserts missing sentences.                                                                    |
| **LLMOrchestrator**     | Multi‑provider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting.                                                                |
| **PostLLMVerifier**     | NLI‑based contradiction detection, unauthorized entity detection, semantic drift monitoring.                                                                 |
| **ResponseCorrector**   | Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders.                                                                  |
| **Semantic Cache**      | RedisVL‑powered vector cache (cosine similarity), optional TTL and domain filtering.                                                                         |

---


# Benchmark

## Comparison with State‑of‑the‑Art (SOTA)

| Solution                | Injection Prevention | Entity Masking | Prompt Compression | Restriction Enforcement | Post‑LLM Verification | Offline | Open Source | Multi‑LLM |
|-------------------------|:--------------------:|:--------------:|:------------------:|:------------------------:|:---------------------:|:-------:|:-----------:|:---------:|
| **NLProxy**             | βœ…                   | βœ…             | βœ… (semantic)       | βœ…                       | βœ…                    | βœ…      | βœ… (BSL 1.1)| βœ…        |
| LangChain               | ❌ (no built‑in)     | ❌             | ❌ (only templates) | ❌                       | ❌                    | ⚠️ partial | βœ…        | βœ…        |
| Semantic Kernel         | ❌                   | ❌             | ❌                  | ❌                       | ❌                    | ⚠️ partial | βœ…        | βœ…        |
| LLMLingua / Selective Context | ❌           | ❌             | βœ… (token‑level)    | ❌                       | ❌                    | βœ…      | βœ…        | ❌        |
| Rebuff (injection)      | βœ…                   | ❌             | ❌                  | ❌                       | ❌                    | ⚠️      | βœ…        | ❌        |
| Lakera Guard            | βœ…                   | βœ… (basic)     | ❌                  | ❌                       | ❌                    | ❌       | ❌        | ❌        |
| Azure OpenAI Content Safety | βœ…             | ❌             | ❌                  | ❌                       | ❌                    | ❌       | ❌        | βœ…        |

**Key differentiators:**  
- NLProxy is the **only open‑source solution** that combines **prompt security, semantic compression, constraint enforcement, and response verification** in a single pipeline.  
- All critical components work **offline** (embedding & NLI models are downloaded once and run locally).  
- The **business‑friendly BSL 1.1 license** allows free use for indie developers, students, and non‑profits, while requiring a commercial license for large enterprises (>$1M revenue).

### Compression Efficiency

| Metric                              | Value                                    |
|-------------------------------------|------------------------------------------|
| Average token reduction (general)   | **45‑55%**                               |
| Reduction on legal/finance documents| 35‑45% (conservative)                    |
| Reduction on code prompts           | 55‑65%                                   |
| Compression latency (per prompt)    | 50‑120 ms (CPU), 20‑40 ms (GPU)          |
| Embedding model                     | all‑MiniLM‑L6‑v2 (384 dim, ONNX)         |
| Clustering method                   | Auto‑select Ward (<200 sent) / K‑Means   |

### Security & Verification

| Check                              | Accuracy / Throughput                    |
|------------------------------------|------------------------------------------|
| Injection detection (regex)        | >99% on known patterns (MITRE ATLAS)     |
| Semantic injection (embedding)     | 92% recall @ 0.85 threshold (optional)   |
| Entity masking                     | 100% of IPs, emails, dates, hashes       |
| NLI contradiction detection        | 78‑85% accuracy (distilroberta‑base)     |
| Restriction enforcement (FORBID)   | 100% (exact match)                       |
| Post‑LLM verification latency      | +30‑60 ms per request (NLI enabled)      |

### End‑to‑End Latency

| Configuration                        | P95 Latency (ms) |
|--------------------------------------|------------------|
| Compression only (no NLI, no cache)  | 120‑180          |
| Compression + Firewall + Shield      | 150‑220          |
| Full pipeline + NLI verification     | 200‑300          |
| Full pipeline + Semantic Cache (hit) | <10              |

### Scalability

| Component                | Limit / Sizing Guideline                              |
|--------------------------|-------------------------------------------------------|
| Max prompt length        | 100k chars (configurable)                             |
| Concurrent requests      | Limited by `--workers` + thread pool (default 8)     |
| Embedding batch size     | 128 sentences (can be increased with more memory)    |
| Redis cache capacity     | Unlimited (depends on Redis memory)                  |
| Multi‑LLM failover        | Supports fallback chains (OpenAI β†’ Claude β†’ Gemini)  |


---

## πŸ“„ License

NLProxy is released under the **Business Source License 1.1** (BSL 1.1).  
- βœ… Free for **indie developers, students, non‑profits, and small businesses** (revenue < $1M).  
- 🏒 **Large enterprises** (revenue β‰₯ $1M) require a commercial license – contact us for pricing.  
- πŸ”“ After **five years** from the release date, the code automatically converts to **Apache 2.0**.

See the [LICENSE.md](LICENSE.md) file for full text.

---

## πŸ’¬ Support & Contact

- πŸ“§ Email: **intellideeplabs@gmail.com**  
- πŸ’¬ Telegram: [@itsLerb](https://t.me/itsLerb) (click to open) – *response within 24h*  
- πŸ› Issues: Use [GitHub Issues](https://github.com/intellideep/nlproxy/issues) for bugs and feature requests.

We welcome contributions, but please open an issue first to discuss.