---
title: NLProxy Enterprise Demo
emoji: π‘οΈ
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: other
tags:
- llm
- prompt-compression
- security
- firewall
- nli
- pii-masking
- enterprise
---
NLProxy
Prompt Security & Compression Gateway for LLMs
The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.
[](https://github.com/intellideep/nlproxy/blob/main/LICENSE.md)
[](https://pypi.org/project/nlproxy/)
[](https://github.com/intellideep/nlproxy)
---
## ποΈ About This Interactive Demo
This Hugging Face Space serves as a **live, interactive sandbox** for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time.
Upon startup, this Space dynamically clones the official [`intellideep/nlproxy`](https://github.com/intellideep/nlproxy) repository, downloads the required ONNX/NLI models, and exposes the complete **5-Step Lifecycle** via a Gradio interface.
---
## π The Problem with LLMs Today
Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:
1. **Burning money** on redundant words, pleasantries, and verbose context.
2. **Leaking PII** (emails, IPs, internal code) to third-party servers.
3. **Exposing yourself** to jailbreaks, prompt injections, and semantic drift.
**NLProxy fixes all three before the prompt ever leaves your infrastructure.**
---
## π― Why NLProxy?
### π° Slash Your LLM Bill (Semantic Compression)
NLProxy doesn't just strip stopwords. It uses **KMeans/Ward semantic clustering** and **ONNX-quantized embeddings** to understand the *meaning* of your prompt. It identifies redundant sentences and compresses them, **reducing token usage by 40% to 60%** without losing critical intent.
> *Result: A $1,000/month OpenAI bill becomes $400.*
### ποΈ The 6-Stage Defense Pipeline (Visualized in this Demo)
```text
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NLProxy Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π₯ INPUT: "Ignore instructions... IP 192.168.1.1..." β
β β β
β π‘οΈ [1] FIREWALL β
β ββ PromptFirewall.check_prompt() β
β ββ Action: BLOCK / ALERT / REWRITE / ALLOW β
β β β
β π [2] COMPRESS β
β ββ CompressionService.compress_batch() β
β ββ Shield β Segment β Cluster β Reconstruct β
β ββ Output: "IP: __PROT_xxx. Do NOT use Python..." β
β β β
β π [3] SAFETY β
β ββ SafetyChecker.validate() β
β ββ Reinserts critical intents if missing β
β β β
β π€ [4] LLM CALL (Simulated in this demo) β
β ββ LLMOrchestrator.generate() β
β ββ OpenAI / Claude / Gemini / Local β
β β β
β π§Ή [5] CORRECT β
β ββ ResponseCorrector.correct() β
β ββ Applies FORBID/MANDATE + redacts unauthorized β
β β β
β π [6] VERIFY β
β ββ PostLLMVerifier.verify() β
β ββ NLI contradiction detection β
β ββ Confidence: 0.30 β 0.85 (after auto-correction) β
β β β
β π€ OUTPUT: "Solution in Java. Connection protected." β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### π‘οΈ Unbreakable Security (Firewall & Verification)
- **Pre-Flight:** A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
- **Post-Flight:** NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.
### RealβWorld Use Cases
| Use Case | NLProxy Benefit |
|-----------------------------------|---------------------------------------------------------------------------------|
| **Chatβbased customer support** | Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. |
| **Code generation assistant** | Masks API keys and internal IPs; enforces βdo not use Pythonβ restrictions. |
| **Legal document analysis** | Preserves confidentiality and privilege statements even after heavy compression. |
| **Multiβtenant SaaS** | Semantic cache + domain filtering reduces redundant LLM calls by 70β80%. |
| **Onβpremise deployment** | Works fully offline, no external dependencies (optional Redis for cache). |
---
# Components
| Component | Function |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Firewall** | Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration). |
| **Shield** | Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE). |
| **Segmenter** | Languageβaware sentence splitting + ONNXβaccelerated sentence embeddings (384βd MiniLM). |
| **Compressor** | Clusteringβbased redundancy removal (Ward / KβMeans) with variance filtering. |
| **Reconstructor** | Reβinjects masked entities, removes stopwords, and computes token/cost savings. |
| **SafetyChecker** | Verifies critical intents/restrictions survive compression; reβinserts missing sentences. |
| **LLMOrchestrator** | Multiβprovider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting. |
| **PostLLMVerifier** | NLIβbased contradiction detection, unauthorized entity detection, semantic drift monitoring. |
| **ResponseCorrector** | Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders. |
| **Semantic Cache** | RedisVLβpowered vector cache (cosine similarity), optional TTL and domain filtering. |
---
# Benchmark
## Comparison with StateβofβtheβArt (SOTA)
| Solution | Injection Prevention | Entity Masking | Prompt Compression | Restriction Enforcement | PostβLLM Verification | Offline | Open Source | MultiβLLM |
|-------------------------|:--------------------:|:--------------:|:------------------:|:------------------------:|:---------------------:|:-------:|:-----------:|:---------:|
| **NLProxy** | β
| β
| β
(semantic) | β
| β
| β
| β
(BSL 1.1)| β
|
| LangChain | β (no builtβin) | β | β (only templates) | β | β | β οΈ partial | β
| β
|
| Semantic Kernel | β | β | β | β | β | β οΈ partial | β
| β
|
| LLMLingua / Selective Context | β | β | β
(tokenβlevel) | β | β | β
| β
| β |
| Rebuff (injection) | β
| β | β | β | β | β οΈ | β
| β |
| Lakera Guard | β
| β
(basic) | β | β | β | β | β | β |
| Azure OpenAI Content Safety | β
| β | β | β | β | β | β | β
|
**Key differentiators:**
- NLProxy is the **only openβsource solution** that combines **prompt security, semantic compression, constraint enforcement, and response verification** in a single pipeline.
- All critical components work **offline** (embedding & NLI models are downloaded once and run locally).
- The **businessβfriendly BSL 1.1 license** allows free use for indie developers, students, and nonβprofits, while requiring a commercial license for large enterprises (>$1M revenue).
### Compression Efficiency
| Metric | Value |
|-------------------------------------|------------------------------------------|
| Average token reduction (general) | **45β55%** |
| Reduction on legal/finance documents| 35β45% (conservative) |
| Reduction on code prompts | 55β65% |
| Compression latency (per prompt) | 50β120 ms (CPU), 20β40 ms (GPU) |
| Embedding model | allβMiniLMβL6βv2 (384 dim, ONNX) |
| Clustering method | Autoβselect Ward (<200 sent) / KβMeans |
### Security & Verification
| Check | Accuracy / Throughput |
|------------------------------------|------------------------------------------|
| Injection detection (regex) | >99% on known patterns (MITRE ATLAS) |
| Semantic injection (embedding) | 92% recall @ 0.85 threshold (optional) |
| Entity masking | 100% of IPs, emails, dates, hashes |
| NLI contradiction detection | 78β85% accuracy (distilrobertaβbase) |
| Restriction enforcement (FORBID) | 100% (exact match) |
| PostβLLM verification latency | +30β60 ms per request (NLI enabled) |
### EndβtoβEnd Latency
| Configuration | P95 Latency (ms) |
|--------------------------------------|------------------|
| Compression only (no NLI, no cache) | 120β180 |
| Compression + Firewall + Shield | 150β220 |
| Full pipeline + NLI verification | 200β300 |
| Full pipeline + Semantic Cache (hit) | <10 |
### Scalability
| Component | Limit / Sizing Guideline |
|--------------------------|-------------------------------------------------------|
| Max prompt length | 100k chars (configurable) |
| Concurrent requests | Limited by `--workers` + thread pool (default 8) |
| Embedding batch size | 128 sentences (can be increased with more memory) |
| Redis cache capacity | Unlimited (depends on Redis memory) |
| MultiβLLM failover | Supports fallback chains (OpenAI β Claude β Gemini) |
---
## π License
NLProxy is released under the **Business Source License 1.1** (BSL 1.1).
- β
Free for **indie developers, students, nonβprofits, and small businesses** (revenue < $1M).
- π’ **Large enterprises** (revenue β₯ $1M) require a commercial license β contact us for pricing.
- π After **five years** from the release date, the code automatically converts to **Apache 2.0**.
See the [LICENSE.md](LICENSE.md) file for full text.
---
## π¬ Support & Contact
- π§ Email: **intellideeplabs@gmail.com**
- π¬ Telegram: [@itsLerb](https://t.me/itsLerb) (click to open) β *response within 24h*
- π Issues: Use [GitHub Issues](https://github.com/intellideep/nlproxy/issues) for bugs and feature requests.
We welcome contributions, but please open an issue first to discuss.