Spaces:
Running
A newer version of the Gradio SDK is available: 6.19.0
title: NLProxy Enterprise Demo
emoji: π‘οΈ
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: other
tags:
- llm
- prompt-compression
- security
- firewall
- nli
- pii-masking
- enterprise
NLProxy
Prompt Security & Compression Gateway for LLMs
The enterprise-grade, offline-first middleware that cuts your LLM bill by up to 60% while enforcing zero-trust security.
ποΈ About This Interactive Demo
This Hugging Face Space serves as a live, interactive sandbox for the NLProxy Pipeline. Instead of just reading about it, you can visually audit how NLProxy protects, compresses, and verifies LLM interactions in real-time.
Upon startup, this Space dynamically clones the official intellideep/nlproxy repository, downloads the required ONNX/NLI models, and exposes the complete 5-Step Lifecycle via a Gradio interface.
π The Problem with LLMs Today
Every time you send a prompt to OpenAI, Anthropic, or Gemini, you are doing three dangerous things:
- Burning money on redundant words, pleasantries, and verbose context.
- Leaking PII (emails, IPs, internal code) to third-party servers.
- Exposing yourself to jailbreaks, prompt injections, and semantic drift.
NLProxy fixes all three before the prompt ever leaves your infrastructure.
π― Why NLProxy?
π° Slash Your LLM Bill (Semantic Compression)
NLProxy doesn't just strip stopwords. It uses KMeans/Ward semantic clustering and ONNX-quantized embeddings to understand the meaning of your prompt. It identifies redundant sentences and compresses them, reducing token usage by 40% to 60% without losing critical intent.
Result: A $1,000/month OpenAI bill becomes $400.
ποΈ The 6-Stage Defense Pipeline (Visualized in this Demo)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NLProxy Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π₯ INPUT: "Ignore instructions... IP 192.168.1.1..." β
β β β
β π‘οΈ [1] FIREWALL β
β ββ PromptFirewall.check_prompt() β
β ββ Action: BLOCK / ALERT / REWRITE / ALLOW β
β β β
β π [2] COMPRESS β
β ββ CompressionService.compress_batch() β
β ββ Shield β Segment β Cluster β Reconstruct β
β ββ Output: "IP: __PROT_xxx. Do NOT use Python..." β
β β β
β π [3] SAFETY β
β ββ SafetyChecker.validate() β
β ββ Reinserts critical intents if missing β
β β β
β π€ [4] LLM CALL (Simulated in this demo) β
β ββ LLMOrchestrator.generate() β
β ββ OpenAI / Claude / Gemini / Local β
β β β
β π§Ή [5] CORRECT β
β ββ ResponseCorrector.correct() β
β ββ Applies FORBID/MANDATE + redacts unauthorized β
β β β
β π [6] VERIFY β
β ββ PostLLMVerifier.verify() β
β ββ NLI contradiction detection β
β ββ Confidence: 0.30 β 0.85 (after auto-correction) β
β β β
β π€ OUTPUT: "Solution in Java. Connection protected." β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘οΈ Unbreakable Security (Firewall & Verification)
- Pre-Flight: A multi-layer firewall blocks jailbreaks, system prompt extraction, and SQLi using regex + semantic attack detection.
- Post-Flight: NLI (Natural Language Inference) models verify that the LLM didn't hallucinate forbidden actions or leak unauthorized entities.
RealβWorld Use Cases
| Use Case | NLProxy Benefit |
|---|---|
| Chatβbased customer support | Reduces token costs by 50% while preserving mandatory disclaimers and safety rules. |
| Code generation assistant | Masks API keys and internal IPs; enforces βdo not use Pythonβ restrictions. |
| Legal document analysis | Preserves confidentiality and privilege statements even after heavy compression. |
| Multiβtenant SaaS | Semantic cache + domain filtering reduces redundant LLM calls by 70β80%. |
| Onβpremise deployment | Works fully offline, no external dependencies (optional Redis for cache). |
Components
| Component | Function |
|---|---|
| Firewall | Regex + semantic injection detection (jailbreak, system prompt extraction, data exfiltration). |
| Shield | Entity masking (IPs, emails, codes, PII) and extraction of semantic restrictions (FORBID/MANDATE). |
| Segmenter | Languageβaware sentence splitting + ONNXβaccelerated sentence embeddings (384βd MiniLM). |
| Compressor | Clusteringβbased redundancy removal (Ward / KβMeans) with variance filtering. |
| Reconstructor | Reβinjects masked entities, removes stopwords, and computes token/cost savings. |
| SafetyChecker | Verifies critical intents/restrictions survive compression; reβinserts missing sentences. |
| LLMOrchestrator | Multiβprovider (Gemini, OpenAI, Claude, etc.) with retry, circuit breaker, and rate limiting. |
| PostLLMVerifier | NLIβbased contradiction detection, unauthorized entity detection, semantic drift monitoring. |
| ResponseCorrector | Sanitizes LLM output: removes prohibited entities, enforces mandates, redacts placeholders. |
| Semantic Cache | RedisVLβpowered vector cache (cosine similarity), optional TTL and domain filtering. |
Benchmark
Comparison with StateβofβtheβArt (SOTA)
| Solution | Injection Prevention | Entity Masking | Prompt Compression | Restriction Enforcement | PostβLLM Verification | Offline | Open Source | MultiβLLM |
|---|---|---|---|---|---|---|---|---|
| NLProxy | β | β | β (semantic) | β | β | β | β (BSL 1.1) | β |
| LangChain | β (no builtβin) | β | β (only templates) | β | β | β οΈ partial | β | β |
| Semantic Kernel | β | β | β | β | β | β οΈ partial | β | β |
| LLMLingua / Selective Context | β | β | β (tokenβlevel) | β | β | β | β | β |
| Rebuff (injection) | β | β | β | β | β | β οΈ | β | β |
| Lakera Guard | β | β (basic) | β | β | β | β | β | β |
| Azure OpenAI Content Safety | β | β | β | β | β | β | β | β |
Key differentiators:
- NLProxy is the only openβsource solution that combines prompt security, semantic compression, constraint enforcement, and response verification in a single pipeline.
- All critical components work offline (embedding & NLI models are downloaded once and run locally).
- The businessβfriendly BSL 1.1 license allows free use for indie developers, students, and nonβprofits, while requiring a commercial license for large enterprises (>$1M revenue).
Compression Efficiency
| Metric | Value |
|---|---|
| Average token reduction (general) | 45β55% |
| Reduction on legal/finance documents | 35β45% (conservative) |
| Reduction on code prompts | 55β65% |
| Compression latency (per prompt) | 50β120 ms (CPU), 20β40 ms (GPU) |
| Embedding model | allβMiniLMβL6βv2 (384 dim, ONNX) |
| Clustering method | Autoβselect Ward (<200 sent) / KβMeans |
Security & Verification
| Check | Accuracy / Throughput |
|---|---|
| Injection detection (regex) | >99% on known patterns (MITRE ATLAS) |
| Semantic injection (embedding) | 92% recall @ 0.85 threshold (optional) |
| Entity masking | 100% of IPs, emails, dates, hashes |
| NLI contradiction detection | 78β85% accuracy (distilrobertaβbase) |
| Restriction enforcement (FORBID) | 100% (exact match) |
| PostβLLM verification latency | +30β60 ms per request (NLI enabled) |
EndβtoβEnd Latency
| Configuration | P95 Latency (ms) |
|---|---|
| Compression only (no NLI, no cache) | 120β180 |
| Compression + Firewall + Shield | 150β220 |
| Full pipeline + NLI verification | 200β300 |
| Full pipeline + Semantic Cache (hit) | <10 |
Scalability
| Component | Limit / Sizing Guideline |
|---|---|
| Max prompt length | 100k chars (configurable) |
| Concurrent requests | Limited by --workers + thread pool (default 8) |
| Embedding batch size | 128 sentences (can be increased with more memory) |
| Redis cache capacity | Unlimited (depends on Redis memory) |
| MultiβLLM failover | Supports fallback chains (OpenAI β Claude β Gemini) |
π License
NLProxy is released under the Business Source License 1.1 (BSL 1.1).
- β Free for indie developers, students, nonβprofits, and small businesses (revenue < $1M).
- π’ Large enterprises (revenue β₯ $1M) require a commercial license β contact us for pricing.
- π After five years from the release date, the code automatically converts to Apache 2.0.
See the LICENSE.md file for full text.
π¬ Support & Contact
- π§ Email: intellideeplabs@gmail.com
- π¬ Telegram: @itsLerb (click to open) β response within 24h
- π Issues: Use GitHub Issues for bugs and feature requests.
We welcome contributions, but please open an issue first to discuss.