Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /service.md

Luiserb

first commit

2129c29 16 days ago

preview code

Raw

History Blame Contribute Delete

3.15 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Service Module Reference

This document describes the compression orchestration service in service/compression.py.

Purpose

CompressionService coordinates the full prompt compression workflow, including shielding, segmentation, compression, reconstruction, and safety validation.

Primary Class

`CompressionService`

Responsibilities

Orchestrates prompt transformation across multiple core modules.
Executes shielding, semantic segmentation, compression, and reconstruction stages.
Provides thread pool parallelism for batch workloads.
Optionally integrates Redis-backed semantic caching.
Controls privacy mode and NLI refinement.

Constructor

CompressionService(
    use_cache: bool = True,
    device: Optional[str] = None,
    redis_url: Optional[str] = None,
    nli_refinement_fn: Optional = None,
    privacy_mode: bool = False,
    models_dir: Optional[Path] = None,
    llm_default_model: Optional[str] = None,
    thread_pool_workers: Optional[int] = None,
)

Key Behaviors

Builds a thread pool via ThreadPoolExecutor(max_workers=self.thread_pool_workers).
Reads NLPROXY_COMPRESSION_WORKERS to override default worker count.
Initializes PromptShield, SemanticSegmenter, SemanticCompressor, PromptReconstructor, and SafetyChecker.
Optionally initializes SemanticLLMCache if Redis is configured.
Caches shield and embedding results in memory when use_cache=True.

Pipeline Stages

Shielding: PromptShield protects sensitive text and extracts restrictions.
Segmentation: SemanticSegmenter splits text into sentences and encodes them.
Compression: SemanticCompressor selects representative sentence clusters.
Reconstruction: PromptReconstructor rebuilds prompt text and computes metrics.
Safety: SafetyChecker validates intent preservation and optional perplexity.

Parallel Execution

Uses ThreadPoolExecutor for parallel shield and compression tasks.
Submits _shield_with_cache and _process_single jobs concurrently.
Collects results with as_completed().
Ensures blocking CPU-bound operations do not stall the event loop.

Performance Characteristics

Latency is dominated by embedding generation and LLM inference.
Batch complexity is roughly O(N · T_stage / M) where N = prompt count and M = worker count.
Effective compression aggressiveness adapts based on NLI confidence and domain mode.

Configuration

NLPROXY_COMPRESSION_WORKERS controls thread pool size.
privacy_mode toggles strict handling of protected entities.
redis_url enables distributed semantic cache.
models_dir defines the local model artifact directory.

Dependencies

numpy
redis (optional)
sentence_transformers
optimum.onnxruntime (for ONNX segmenter backends)

Edge Cases

Empty prompts return a result with zero tokens and a safety alert.
Redis unavailability causes the service to fallback to disabled semantic cache.
compress_batch_async must be called within an async event loop.
Compression failures are retried up to configured limits in the API layer.