# NLProxy Service Module Reference This document describes the compression orchestration service in `service/compression.py`. ## Purpose `CompressionService` coordinates the full prompt compression workflow, including shielding, segmentation, compression, reconstruction, and safety validation. ## Primary Class ### `CompressionService` #### Responsibilities - Orchestrates prompt transformation across multiple core modules. - Executes shielding, semantic segmentation, compression, and reconstruction stages. - Provides thread pool parallelism for batch workloads. - Optionally integrates Redis-backed semantic caching. - Controls privacy mode and NLI refinement. #### Constructor ```python CompressionService( use_cache: bool = True, device: Optional[str] = None, redis_url: Optional[str] = None, nli_refinement_fn: Optional = None, privacy_mode: bool = False, models_dir: Optional[Path] = None, llm_default_model: Optional[str] = None, thread_pool_workers: Optional[int] = None, ) ``` #### Key Behaviors - Builds a thread pool via `ThreadPoolExecutor(max_workers=self.thread_pool_workers)`. - Reads `NLPROXY_COMPRESSION_WORKERS` to override default worker count. - Initializes `PromptShield`, `SemanticSegmenter`, `SemanticCompressor`, `PromptReconstructor`, and `SafetyChecker`. - Optionally initializes `SemanticLLMCache` if Redis is configured. - Caches shield and embedding results in memory when `use_cache=True`. #### Pipeline Stages 1. Shielding: `PromptShield` protects sensitive text and extracts restrictions. 2. Segmentation: `SemanticSegmenter` splits text into sentences and encodes them. 3. Compression: `SemanticCompressor` selects representative sentence clusters. 4. Reconstruction: `PromptReconstructor` rebuilds prompt text and computes metrics. 5. Safety: `SafetyChecker` validates intent preservation and optional perplexity. #### Parallel Execution - Uses `ThreadPoolExecutor` for parallel shield and compression tasks. - Submits `_shield_with_cache` and `_process_single` jobs concurrently. - Collects results with `as_completed()`. - Ensures blocking CPU-bound operations do not stall the event loop. ## Performance Characteristics - Latency is dominated by embedding generation and LLM inference. - Batch complexity is roughly O(N ยท T_stage / M) where N = prompt count and M = worker count. - Effective compression aggressiveness adapts based on NLI confidence and domain mode. ## Configuration - `NLPROXY_COMPRESSION_WORKERS` controls thread pool size. - `privacy_mode` toggles strict handling of protected entities. - `redis_url` enables distributed semantic cache. - `models_dir` defines the local model artifact directory. ## Dependencies - `numpy` - `redis` (optional) - `sentence_transformers` - `optimum.onnxruntime` (for ONNX segmenter backends) ## Edge Cases - Empty prompts return a result with zero tokens and a safety alert. - Redis unavailability causes the service to fallback to disabled semantic cache. - `compress_batch_async` must be called within an async event loop. - Compression failures are retried up to configured limits in the API layer.