# NLProxy Cache Module Reference This module documents `cache/semantic_cache.py`. ## Purpose `SemanticLLMCache` provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses. ## Key Class ### `SemanticLLMCache` #### Responsibilities - Normalize and store embedding vectors in RedisVL. - Search cached vectors based on cosine similarity. - Enforce TTL-based expiration and domain isolation. - Maintain hit/miss statistics. #### Constructor ```python SemanticLLMCache( redis_url: str = "redis://localhost:6379", similarity_threshold: float = 0.92, default_ttl: int = 3600, dimension: int = 384, index_name: str = "prompt_cache", prefix: str = "cache:", max_connections: int = 50, socket_timeout: float = 5.0, ) ``` #### Important Methods - `_normalize(embedding: np.ndarray) -> List[float]` - Converts raw embeddings into L2-normalized Python lists. - Complexity: O(d). - `store(query_embedding, response_text, metadata, domain)` - Stores a cached entry in a RedisVL vector index. - Writes both vector and metadata fields. - `search(query_embedding, domain=None) -> Optional[Dict[str, Any]]` - Performs vector similarity search with threshold filtering. - Complexity: O(N ยท d) for flat scan; uses RedisVL index heuristics. - `clear(domain: Optional[str] = None)` - Deletes cached entries globally or within a domain. - `get_stats() -> Dict[str, int]` - Returns hit/miss counters. ## Dependencies - `redis` / `redis-py` - `redisvl` for vector search index management - `numpy` ## Performance Characteristics - Embedding normalization is linear in embedding dimension. - Search cost scales with number of indexed entries and vector size. - RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes. ## Scalability Considerations - Default Redis connection pool size is 50. This is configurable via `max_connections`. - `socket_timeout` ensures network faults fail fast. - For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended. ## Operational Guidelines - Ensure `dimension` matches the embedding model output size. - Configure `similarity_threshold` carefully; values near `1.0` reduce false positives but also lower hit rate. - Monitor hit/miss ratios and eviction trends. ## Edge Cases - The cache treats a missing Redis connection as a hard failure during initialization. - A vector index with unmatched schema or incompatible dimension will fail to create. - Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur.