Spaces:
Running
Running
| # NLProxy Service Module Reference | |
| This document describes the compression orchestration service in `service/compression.py`. | |
| ## Purpose | |
| `CompressionService` coordinates the full prompt compression workflow, including shielding, segmentation, compression, reconstruction, and safety validation. | |
| ## Primary Class | |
| ### `CompressionService` | |
| #### Responsibilities | |
| - Orchestrates prompt transformation across multiple core modules. | |
| - Executes shielding, semantic segmentation, compression, and reconstruction stages. | |
| - Provides thread pool parallelism for batch workloads. | |
| - Optionally integrates Redis-backed semantic caching. | |
| - Controls privacy mode and NLI refinement. | |
| #### Constructor | |
| ```python | |
| CompressionService( | |
| use_cache: bool = True, | |
| device: Optional[str] = None, | |
| redis_url: Optional[str] = None, | |
| nli_refinement_fn: Optional = None, | |
| privacy_mode: bool = False, | |
| models_dir: Optional[Path] = None, | |
| llm_default_model: Optional[str] = None, | |
| thread_pool_workers: Optional[int] = None, | |
| ) | |
| ``` | |
| #### Key Behaviors | |
| - Builds a thread pool via `ThreadPoolExecutor(max_workers=self.thread_pool_workers)`. | |
| - Reads `NLPROXY_COMPRESSION_WORKERS` to override default worker count. | |
| - Initializes `PromptShield`, `SemanticSegmenter`, `SemanticCompressor`, `PromptReconstructor`, and `SafetyChecker`. | |
| - Optionally initializes `SemanticLLMCache` if Redis is configured. | |
| - Caches shield and embedding results in memory when `use_cache=True`. | |
| #### Pipeline Stages | |
| 1. Shielding: `PromptShield` protects sensitive text and extracts restrictions. | |
| 2. Segmentation: `SemanticSegmenter` splits text into sentences and encodes them. | |
| 3. Compression: `SemanticCompressor` selects representative sentence clusters. | |
| 4. Reconstruction: `PromptReconstructor` rebuilds prompt text and computes metrics. | |
| 5. Safety: `SafetyChecker` validates intent preservation and optional perplexity. | |
| #### Parallel Execution | |
| - Uses `ThreadPoolExecutor` for parallel shield and compression tasks. | |
| - Submits `_shield_with_cache` and `_process_single` jobs concurrently. | |
| - Collects results with `as_completed()`. | |
| - Ensures blocking CPU-bound operations do not stall the event loop. | |
| ## Performance Characteristics | |
| - Latency is dominated by embedding generation and LLM inference. | |
| - Batch complexity is roughly O(N · T_stage / M) where N = prompt count and M = worker count. | |
| - Effective compression aggressiveness adapts based on NLI confidence and domain mode. | |
| ## Configuration | |
| - `NLPROXY_COMPRESSION_WORKERS` controls thread pool size. | |
| - `privacy_mode` toggles strict handling of protected entities. | |
| - `redis_url` enables distributed semantic cache. | |
| - `models_dir` defines the local model artifact directory. | |
| ## Dependencies | |
| - `numpy` | |
| - `redis` (optional) | |
| - `sentence_transformers` | |
| - `optimum.onnxruntime` (for ONNX segmenter backends) | |
| ## Edge Cases | |
| - Empty prompts return a result with zero tokens and a safety alert. | |
| - Redis unavailability causes the service to fallback to disabled semantic cache. | |
| - `compress_batch_async` must be called within an async event loop. | |
| - Compression failures are retried up to configured limits in the API layer. | |