# NLProxy Cache Module Reference

This module documents `cache/semantic_cache.py`.

## Purpose

`SemanticLLMCache` provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses.

## Key Class

### `SemanticLLMCache`

#### Responsibilities

- Normalize and store embedding vectors in RedisVL.
- Search cached vectors based on cosine similarity.
- Enforce TTL-based expiration and domain isolation.
- Maintain hit/miss statistics.

#### Constructor

```python
SemanticLLMCache(
    redis_url: str = "redis://localhost:6379",
    similarity_threshold: float = 0.92,
    default_ttl: int = 3600,
    dimension: int = 384,
    index_name: str = "prompt_cache",
    prefix: str = "cache:",
    max_connections: int = 50,
    socket_timeout: float = 5.0,
)
```

#### Important Methods

- `_normalize(embedding: np.ndarray) -> List[float]`
  - Converts raw embeddings into L2-normalized Python lists.
  - Complexity: O(d).

- `store(query_embedding, response_text, metadata, domain)`
  - Stores a cached entry in a RedisVL vector index.
  - Writes both vector and metadata fields.

- `search(query_embedding, domain=None) -> Optional[Dict[str, Any]]`
  - Performs vector similarity search with threshold filtering.
  - Complexity: O(N · d) for flat scan; uses RedisVL index heuristics.

- `clear(domain: Optional[str] = None)`
  - Deletes cached entries globally or within a domain.

- `get_stats() -> Dict[str, int]`
  - Returns hit/miss counters.

## Dependencies

- `redis` / `redis-py`
- `redisvl` for vector search index management
- `numpy`

## Performance Characteristics

- Embedding normalization is linear in embedding dimension.
- Search cost scales with number of indexed entries and vector size.
- RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes.

## Scalability Considerations

- Default Redis connection pool size is 50. This is configurable via `max_connections`.
- `socket_timeout` ensures network faults fail fast.
- For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended.

## Operational Guidelines

- Ensure `dimension` matches the embedding model output size.
- Configure `similarity_threshold` carefully; values near `1.0` reduce false positives but also lower hit rate.
- Monitor hit/miss ratios and eviction trends.

## Edge Cases

- The cache treats a missing Redis connection as a hard failure during initialization.
- A vector index with unmatched schema or incompatible dimension will fail to create.
- Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur.