Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /service.md

Luiserb

first commit

2129c29 17 days ago

preview code

Raw

History Blame Contribute Delete

3.15 kB

	# NLProxy Service Module Reference

	This document describes the compression orchestration service in `service/compression.py`.

	## Purpose

	`CompressionService` coordinates the full prompt compression workflow, including shielding, segmentation, compression, reconstruction, and safety validation.

	## Primary Class

	### `CompressionService`

	#### Responsibilities

	- Orchestrates prompt transformation across multiple core modules.
	- Executes shielding, semantic segmentation, compression, and reconstruction stages.
	- Provides thread pool parallelism for batch workloads.
	- Optionally integrates Redis-backed semantic caching.
	- Controls privacy mode and NLI refinement.

	#### Constructor

	```python
	CompressionService(
	use_cache: bool = True,
	device: Optional[str] = None,
	redis_url: Optional[str] = None,
	nli_refinement_fn: Optional = None,
	privacy_mode: bool = False,
	models_dir: Optional[Path] = None,
	llm_default_model: Optional[str] = None,
	thread_pool_workers: Optional[int] = None,
	)
	```

	#### Key Behaviors

	- Builds a thread pool via `ThreadPoolExecutor(max_workers=self.thread_pool_workers)`.
	- Reads `NLPROXY_COMPRESSION_WORKERS` to override default worker count.
	- Initializes `PromptShield`, `SemanticSegmenter`, `SemanticCompressor`, `PromptReconstructor`, and `SafetyChecker`.
	- Optionally initializes `SemanticLLMCache` if Redis is configured.
	- Caches shield and embedding results in memory when `use_cache=True`.

	#### Pipeline Stages

	1. Shielding: `PromptShield` protects sensitive text and extracts restrictions.
	2. Segmentation: `SemanticSegmenter` splits text into sentences and encodes them.
	3. Compression: `SemanticCompressor` selects representative sentence clusters.
	4. Reconstruction: `PromptReconstructor` rebuilds prompt text and computes metrics.
	5. Safety: `SafetyChecker` validates intent preservation and optional perplexity.

	#### Parallel Execution

	- Uses `ThreadPoolExecutor` for parallel shield and compression tasks.
	- Submits `_shield_with_cache` and `_process_single` jobs concurrently.
	- Collects results with `as_completed()`.
	- Ensures blocking CPU-bound operations do not stall the event loop.

	## Performance Characteristics

	- Latency is dominated by embedding generation and LLM inference.
	- Batch complexity is roughly O(N · T_stage / M) where N = prompt count and M = worker count.
	- Effective compression aggressiveness adapts based on NLI confidence and domain mode.

	## Configuration

	- `NLPROXY_COMPRESSION_WORKERS` controls thread pool size.
	- `privacy_mode` toggles strict handling of protected entities.
	- `redis_url` enables distributed semantic cache.
	- `models_dir` defines the local model artifact directory.

	## Dependencies

	- `numpy`
	- `redis` (optional)
	- `sentence_transformers`
	- `optimum.onnxruntime` (for ONNX segmenter backends)

	## Edge Cases

	- Empty prompts return a result with zero tokens and a safety alert.
	- Redis unavailability causes the service to fallback to disabled semantic cache.
	- `compress_batch_async` must be called within an async event loop.
	- Compression failures are retried up to configured limits in the API layer.