Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /cache.md

Luiserb

first commit

2129c29 13 days ago

preview code

Raw

History Blame Contribute Delete

2.79 kB

	# NLProxy Cache Module Reference

	This module documents `cache/semantic_cache.py`.

	## Purpose

	`SemanticLLMCache` provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses.

	## Key Class

	### `SemanticLLMCache`

	#### Responsibilities

	- Normalize and store embedding vectors in RedisVL.
	- Search cached vectors based on cosine similarity.
	- Enforce TTL-based expiration and domain isolation.
	- Maintain hit/miss statistics.

	#### Constructor

	```python
	SemanticLLMCache(
	redis_url: str = "redis://localhost:6379",
	similarity_threshold: float = 0.92,
	default_ttl: int = 3600,
	dimension: int = 384,
	index_name: str = "prompt_cache",
	prefix: str = "cache:",
	max_connections: int = 50,
	socket_timeout: float = 5.0,
	)
	```

	#### Important Methods

	- `_normalize(embedding: np.ndarray) -> List[float]`
	- Converts raw embeddings into L2-normalized Python lists.
	- Complexity: O(d).

	- `store(query_embedding, response_text, metadata, domain)`
	- Stores a cached entry in a RedisVL vector index.
	- Writes both vector and metadata fields.

	- `search(query_embedding, domain=None) -> Optional[Dict[str, Any]]`
	- Performs vector similarity search with threshold filtering.
	- Complexity: O(N · d) for flat scan; uses RedisVL index heuristics.

	- `clear(domain: Optional[str] = None)`
	- Deletes cached entries globally or within a domain.

	- `get_stats() -> Dict[str, int]`
	- Returns hit/miss counters.

	## Dependencies

	- `redis` / `redis-py`
	- `redisvl` for vector search index management
	- `numpy`

	## Performance Characteristics

	- Embedding normalization is linear in embedding dimension.
	- Search cost scales with number of indexed entries and vector size.
	- RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes.

	## Scalability Considerations

	- Default Redis connection pool size is 50. This is configurable via `max_connections`.
	- `socket_timeout` ensures network faults fail fast.
	- For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended.

	## Operational Guidelines

	- Ensure `dimension` matches the embedding model output size.
	- Configure `similarity_threshold` carefully; values near `1.0` reduce false positives but also lower hit rate.
	- Monitor hit/miss ratios and eviction trends.

	## Edge Cases

	- The cache treats a missing Redis connection as a hard failure during initialization.
	- A vector index with unmatched schema or incompatible dimension will fail to create.
	- Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur.