# NLProxy Service Module Reference

This document describes the compression orchestration service in `service/compression.py`.

## Purpose

`CompressionService` coordinates the full prompt compression workflow, including shielding, segmentation, compression, reconstruction, and safety validation.

## Primary Class

### `CompressionService`

#### Responsibilities

- Orchestrates prompt transformation across multiple core modules.
- Executes shielding, semantic segmentation, compression, and reconstruction stages.
- Provides thread pool parallelism for batch workloads.
- Optionally integrates Redis-backed semantic caching.
- Controls privacy mode and NLI refinement.

#### Constructor

```python
CompressionService(
    use_cache: bool = True,
    device: Optional[str] = None,
    redis_url: Optional[str] = None,
    nli_refinement_fn: Optional = None,
    privacy_mode: bool = False,
    models_dir: Optional[Path] = None,
    llm_default_model: Optional[str] = None,
    thread_pool_workers: Optional[int] = None,
)
```

#### Key Behaviors

- Builds a thread pool via `ThreadPoolExecutor(max_workers=self.thread_pool_workers)`.
- Reads `NLPROXY_COMPRESSION_WORKERS` to override default worker count.
- Initializes `PromptShield`, `SemanticSegmenter`, `SemanticCompressor`, `PromptReconstructor`, and `SafetyChecker`.
- Optionally initializes `SemanticLLMCache` if Redis is configured.
- Caches shield and embedding results in memory when `use_cache=True`.

#### Pipeline Stages

1. Shielding: `PromptShield` protects sensitive text and extracts restrictions.
2. Segmentation: `SemanticSegmenter` splits text into sentences and encodes them.
3. Compression: `SemanticCompressor` selects representative sentence clusters.
4. Reconstruction: `PromptReconstructor` rebuilds prompt text and computes metrics.
5. Safety: `SafetyChecker` validates intent preservation and optional perplexity.

#### Parallel Execution

- Uses `ThreadPoolExecutor` for parallel shield and compression tasks.
- Submits `_shield_with_cache` and `_process_single` jobs concurrently.
- Collects results with `as_completed()`.
- Ensures blocking CPU-bound operations do not stall the event loop.

## Performance Characteristics

- Latency is dominated by embedding generation and LLM inference.
- Batch complexity is roughly O(N · T_stage / M) where N = prompt count and M = worker count.
- Effective compression aggressiveness adapts based on NLI confidence and domain mode.

## Configuration

- `NLPROXY_COMPRESSION_WORKERS` controls thread pool size.
- `privacy_mode` toggles strict handling of protected entities.
- `redis_url` enables distributed semantic cache.
- `models_dir` defines the local model artifact directory.

## Dependencies

- `numpy`
- `redis` (optional)
- `sentence_transformers`
- `optimum.onnxruntime` (for ONNX segmenter backends)

## Edge Cases

- Empty prompts return a result with zero tokens and a safety alert.
- Redis unavailability causes the service to fallback to disabled semantic cache.
- `compress_batch_async` must be called within an async event loop.
- Compression failures are retried up to configured limits in the API layer.