Spaces:
Running
Running
| # NLProxy Cache Module Reference | |
| This module documents `cache/semantic_cache.py`. | |
| ## Purpose | |
| `SemanticLLMCache` provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses. | |
| ## Key Class | |
| ### `SemanticLLMCache` | |
| #### Responsibilities | |
| - Normalize and store embedding vectors in RedisVL. | |
| - Search cached vectors based on cosine similarity. | |
| - Enforce TTL-based expiration and domain isolation. | |
| - Maintain hit/miss statistics. | |
| #### Constructor | |
| ```python | |
| SemanticLLMCache( | |
| redis_url: str = "redis://localhost:6379", | |
| similarity_threshold: float = 0.92, | |
| default_ttl: int = 3600, | |
| dimension: int = 384, | |
| index_name: str = "prompt_cache", | |
| prefix: str = "cache:", | |
| max_connections: int = 50, | |
| socket_timeout: float = 5.0, | |
| ) | |
| ``` | |
| #### Important Methods | |
| - `_normalize(embedding: np.ndarray) -> List[float]` | |
| - Converts raw embeddings into L2-normalized Python lists. | |
| - Complexity: O(d). | |
| - `store(query_embedding, response_text, metadata, domain)` | |
| - Stores a cached entry in a RedisVL vector index. | |
| - Writes both vector and metadata fields. | |
| - `search(query_embedding, domain=None) -> Optional[Dict[str, Any]]` | |
| - Performs vector similarity search with threshold filtering. | |
| - Complexity: O(N · d) for flat scan; uses RedisVL index heuristics. | |
| - `clear(domain: Optional[str] = None)` | |
| - Deletes cached entries globally or within a domain. | |
| - `get_stats() -> Dict[str, int]` | |
| - Returns hit/miss counters. | |
| ## Dependencies | |
| - `redis` / `redis-py` | |
| - `redisvl` for vector search index management | |
| - `numpy` | |
| ## Performance Characteristics | |
| - Embedding normalization is linear in embedding dimension. | |
| - Search cost scales with number of indexed entries and vector size. | |
| - RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes. | |
| ## Scalability Considerations | |
| - Default Redis connection pool size is 50. This is configurable via `max_connections`. | |
| - `socket_timeout` ensures network faults fail fast. | |
| - For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended. | |
| ## Operational Guidelines | |
| - Ensure `dimension` matches the embedding model output size. | |
| - Configure `similarity_threshold` carefully; values near `1.0` reduce false positives but also lower hit rate. | |
| - Monitor hit/miss ratios and eviction trends. | |
| ## Edge Cases | |
| - The cache treats a missing Redis connection as a hard failure during initialization. | |
| - A vector index with unmatched schema or incompatible dimension will fail to create. | |
| - Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur. | |