| |
| |
| |
|
|
| |
| Goal: Provide a low-latency, highly available distributed caching layer that supports consistent hashing for horizontal scaling, multi-region replication, configurable eviction policies, and transparent read-through/write-through integration with backend data stores. |
|
|
| Constraint: Cache consistency must follow eventual consistency model with configurable max staleness bound |
| Constraint: Consistent hashing ring must distribute keys within 10% of uniform distribution |
| Constraint: Replication factor must ensure data survives any single node failure |
| Constraint: Eviction must occur before memory usage exceeds 95% of allocated capacity |
| Constraint: All cache operations must complete within 5ms P99 at the local region |
|
|
| Risk: Cache stampede causing thundering herd on cache miss |
| Recovery: Implement request coalescing for in-flight key fetches; only one request per key fetches from backend; other concurrent requests wait on shared future; probabilistic early expiration to stagger re-fetches |
|
|
| Risk: Hotspot key overwhelming single node despite consistent hashing |
| Recovery: Implement virtual nodes (150+ per physical node) for better distribution; for extreme hotspots, replicate hot keys to multiple nodes using read replicas; client-side local caching for top-K hot keys |
|
|
| Risk: Network partition causing cache inconsistency between regions |
| Recovery: Version vectors track causality; conflict resolution using last-write-wins with synchronized clocks; on partition heal, anti-entropy protocol reconciles divergent entries using version comparison; tombstones for deletions |
|
|
| Risk: Node failure causing temporary data loss and increased backend load |
| Recovery: Replicated copies on adjacent ring positions serve reads; background re-replication fills new node from replicas; client retry with replica read on primary failure; circuit breaker prevents backend cascade |
|
|
| Risk: Memory pressure causing unpredictable eviction of recently cached data |
| Recovery: Segmented LRU (protected/probationary) with admission window; items must prove value before entering protected segment; dynamic pool sizing based on hit rate feedback; OOM protection via hard eviction threshold |
|
|
| Risk: Consistent hashing rebalancing causing massive key migration on topology change |
| Recovery: Use bounded loads consistent hashing to limit key movement; vnodes ensure only affected partitions move; rebalancing throttled to prevent network saturation; clients update ring topology lazily |
|
|
| Layer: CacheCore |
| SubLayer: HashRing |
| SubLayer: StorageEngine |
| SubLayer: EvictionManager |
| SubLayer: ReplicationManager |
| Layer: ClientProtocol |
| SubLayer: ReadThrough |
| SubLayer: WriteThrough |
| SubLayer: CacheAside |
| SubLayer: Invalidation |
| Layer: Consistency |
| SubLayer: VersionTracking |
| SubLayer: AntiEntropy |
| SubLayer: ConflictResolution |
|
|
| Validation: Consistent hashing ring must have at least 150 virtual nodes per physical node |
| Validation: Replication factor must be at least 2 for any write-acknowledged key |
| Validation: Eviction policy must protect recently admitted items from immediate re-eviction |
| Validation: Read-through callback must have a timeout of at most 500ms |
| Validation: Cache invalidation must propagate to all replicas within the consistency window |
| Validation: Memory usage must not exceed 95% before triggering proactive eviction |
|
|
| |
| Entity CacheNode |
| nodeId: string |
| host: string |
| port: integer |
| region: string |
| virtualNodeCount: integer |
| memoryUsedBytes: integer |
| memoryTotalBytes: integer |
| cpuUtilization: float |
| status: string |
| lastHeartbeat: datetime |
|
|
| Entity CacheEntry |
| key: string |
| value: bytes |
| version: integer |
| regionVersionVector: dict |
| createdAt: datetime |
| accessedAt: datetime |
| ttlSeconds: integer |
| evictionTier: string |
| accessCount: integer |
| sizeBytes: integer |
|
|
| Entity HashRing |
| ringId: string |
| nodes: list |
| virtualNodes: dict |
| replicationFactor: integer |
| totalSlots: integer |
| version: integer |
| lastRebalanced: datetime |
| maxKeysPerVnode: integer |
|
|
| Entity EvictionPolicy |
| policyName: string |
| protectedRatio: float |
| probationaryRatio: float |
| admissionWindow: integer |
| maxEntrySizeBytes: integer |
| dynamicAdjustmentEnabled: boolean |
| hitRateTarget: float |
| currentHitRate: float |
|
|
| Entity ReplicationConfig |
| replicationFactor: integer |
| writeQuorum: integer |
| readQuorum: integer |
| consistencyLevel: string |
| replicationLagThresholdMs: integer |
| asyncReplicationBatchSize: integer |
| conflictResolutionStrategy: string |
|
|
| Entity InvalidationMessage |
| invalidationId: string |
| key: string |
| keyPattern: string |
| originNode: string |
| reason: string |
| version: integer |
| timestamp: datetime |
| targetRegions: list |
| propagated: boolean |
|
|
| |
| Behavior GetEntry |
| Input: key: string, consistencyLevel: string, timeoutMs: integer |
| Output: value: bytes, version: integer, hit: boolean, sourceNode: string |
| Action: |
| Hash key to locate primary node on the ring |
| If local hit, return value from storage engine |
| If miss, check replica nodes based on consistency level |
| If read-through enabled, coalesce with in-flight fetches for same key |
| Fetch from backend if all cache nodes miss; store result on primary |
| Apply TTL check and return stale-if-valid if within grace period |
| Update access metadata for eviction scoring |
|
|
| Behavior PutEntry |
| Input: key: string, value: bytes, ttlSeconds: integer, writeThrough: boolean, consistencyLevel: string |
| Output: version: integer, replicatedNodes: list, latencyMs: float |
| Action: |
| Hash key to determine primary and replica nodes |
| Check admission policy before accepting entry |
| Write to primary node storage engine |
| Replicate to N-1 replica nodes based on replication factor |
| Await write quorum acknowledgment before responding |
| If write-through enabled, synchronously write to backend data store |
| Invalidate stale copies on non-primary nodes |
| Return version and replication confirmation |
|
|
| Behavior InvalidateKey |
| Input: key: string, pattern: string, reason: string, synchronous: boolean |
| Output: invalidatedNodes: list, invalidationId: string |
| Action: |
| Determine all nodes holding the key or keys matching pattern |
| Delete entry from primary and all replica nodes |
| Publish invalidation message to cross-region replication channels |
| If synchronous, await acknowledgment from all target nodes |
| If asynchronous, fire-and-forget with best-effort delivery |
| Log invalidation for audit trail |
| Emit invalidation metric with latency and scope |
|
|
| Behavior RebalanceRing |
| Input: addedNodes: list, removedNodes: list |
| Output: migratedKeys: integer, durationMs: float, newVersion: integer |
| Action: |
| Compute new ring topology with virtual node assignments |
| Identify keys that must migrate to new primary nodes |
| Transfer key ranges to new owners with streaming |
| Verify data integrity on destination nodes |
| Update routing table atomically across all clients |
| Throttle migration rate to avoid impacting production traffic |
| Increment ring version for client topology refresh |
|
|
| Behavior EvictEntries |
| Input: targetMemoryBytes: integer, policy: EvictionPolicy |
| Output: evictedCount: integer, freedBytes: integer |
| Action: |
| Scan probationary segment of segmented LRU first |
| Apply admission filter to protect frequently accessed items |
| If probationary insufficient, evict from protected segment's tail |
| Respect TTL-based expiration before policy-based eviction |
| Update hit rate metrics after each eviction batch |
| Dynamically adjust protected/probationary ratio based on hit rate |
| Emit eviction count and memory metrics |
| |
| Behavior ResolveConflict |
| Input: key: string, localEntry: CacheEntry, remoteEntry: CacheEntry |
| Output: resolvedEntry: CacheEntry, resolution: string |
| Action: |
| Compare version vectors for causal ordering |
| If one version dominates, use that entry as resolved |
| If concurrent versions, apply conflict resolution strategy |
| For last-write-wins, compare timestamps from synchronized clocks |
| For merge, combine non-overlapping fields if applicable |
| Write resolved entry and propagate anti-entropy update |
| Log conflict for analysis of consistency model violations |
| |
| # Level 4: Conditions |
| Condition: MemoryThresholdBreached |
| When cache node memory usage exceeds 90% of allocated capacity |
| Then trigger proactive eviction of probationary tier entries; if insufficient, evict from protected tier; throttle new admissions until usage drops below 85%; emit critical memory metric |
| |
| Condition: ReplicationLagExceeded |
| When replica falls behind primary by more than replicationLagThresholdMs |
| Then mark replica as stale for read quorum calculations; redirect reads to healthy replicas; trigger catchup replication stream; if catchup fails within 5 minutes, mark replica as degraded |
| |
| Condition: ConsistentHashImbalance |
| When any node's key count deviates more than 15% from the average |
| Then redistribute virtual nodes to balance load; consider adding virtual nodes to underloaded physical nodes; emit imbalance metric; trigger background rebalancing |
|
|
| Condition: BackendUnreachable |
| When read-through or write-through backend becomes unreachable |
| Then serve stale cache entries beyond TTL within grace period; queue write-through writes for async backend delivery; circuit break backend calls after 5 consecutive failures; emit backend unavailability alert |
|
|
| Condition: StampedeDetected |
| When more than 10 concurrent requests miss the same key within 100ms |
| Then coalesce all requests into single backend fetch; return shared future result to all waiters; implement probabilistic early expiration (PER) to stagger future re-fetches; emit stampede metric |
|
|
| |
| Event: OnNodeJoined |
| On new cache node added to the cluster |
| Action: Register node on hash ring, assign virtual nodes, begin key migration from overloaded neighbors, update client routing tables, start replication streams, emit node-join metric |
|
|
| Event: OnNodeFailed |
| On cache node declared failed after missing heartbeats |
| Action: Remove from hash ring, promote replicas to primary for affected keys, trigger re-replication to restore replication factor, update client routing tables, begin data recovery from remaining replicas |
|
|
| Event: OnCacheHitRateDrop |
| On global cache hit rate drops below configured target for 5 consecutive minutes |
| Action: Analyze eviction patterns, increase protected segment ratio, adjust admission policy, identify churn patterns, consider increasing cache capacity, emit hit rate degradation alert |
|
|
| Event: OnInvalidationBroadcast |
| On invalidation message received from another region |
| Action: Apply invalidation to local storage, acknowledge if synchronous, update version vector, propagate to any downstream replicas, emit cross-region invalidation metric |
|
|
| Event: OnRebalanceComplete |
| On ring rebalancing migration finishes |
| Action: Verify all key ranges are correctly placed, update global routing table, validate replication factor for all keys, emit rebalance completion metric with migration stats, enable full traffic on new nodes |
|
|
| |
| Parallel: |
| Independent cache get/put operations across different key partitions |
| Background replication streams to replica nodes |
| Anti-entropy reconciliation between regions |
| Eviction scanning and memory management |
| Client routing table updates |
|
|
| |
| Optimize: Cache hit rate |
| Priority: Use segmented LRU with dynamic admission policy; implement probabilistic early expiration to prevent stampedes; prefetch keys with predictable access patterns; adjust protected/probationary ratio based on real-time hit rate |
|
|
| Optimize: Replication throughput |
| Priority: Batch replication updates into pipeline messages; use async replication for non-critical consistency levels; compress replication payloads with lz4; parallelize replication to independent replicas |
|
|
| Optimize: Memory utilization |
| Priority: Use slab allocation for common value sizes; compress cold entries with zstd; implement object pooling for cache entry metadata; memory-mapped storage for items exceeding threshold size |
|
|
| |
| Learn: Optimal TTL for different key patterns |
| Goal: Maximize hit rate while minimizing stale data served |
| Adapt: default TTL per key prefix pattern |
| Based: Access frequency, backend update frequency, and staleness tolerance observed over 7-day windows |
|
|
| Learn: Admission policy thresholds |
| Goal: Prevent cache pollution from one-time access keys while retaining valuable entries |
| Adapt: admissionWindow and minimumAccessCount for protected tier |
| Based: Historical eviction patterns showing recently-evicted items being re-requested vs. truly cold items |
|
|
| Learn: Replication factor per key namespace |
| Goal: Use higher replication for hot keys and lower replication for cold keys to optimize memory |
| Adapt: dynamicReplicationFactor per key namespace |
| Based: Access frequency distribution, read quorum requirements, and node failure probability |
|
|
| |
| Security: |
| Encrypt: All inter-node replication traffic using TLS 1.3 |
| Encrypt: Client-to-cache connections using TLS with SASL authentication |
| Encrypt: Cache entry values at rest using AES-256-GCM for sensitive key prefixes |
| Protect: Key-level access control via ACLs (read, write, admin) |
| Protect: Admin operations (flush, rebalance, config change) via RBAC |
| Protect: Against cache poisoning by validating write authorization per namespace |
| Protect: Against side-channel attacks via constant-time key comparison for authentication |
|
|
| |
| Native: C++ |
| { |
| class ConsistentHashRing { |
| std::map<uint64_t, std::string> ring_; |
| std::unordered_map<std::string, std::vector<uint64_t>> vnodes_; |
| int replication_factor_; |
| int vnodes_per_node_; |
| |
| public: |
| void addNode(const std::string& node_id) { |
| std::hash<std::string> hasher; |
| for (int i = 0; i < vnodes_per_node_; ++i) { |
| std::string vnode_key = node_id + ":" + std::to_string(i); |
| uint64_t hash = hasher(vnode_key); |
| ring_[hash] = node_id; |
| vnodes_[node_id].push_back(hash); |
| } |
| } |
|
|
| void removeNode(const std::string& node_id) { |
| for (uint64_t hash : vnodes_[node_id]) { |
| ring_.erase(hash); |
| } |
| vnodes_.erase(node_id); |
| } |
|
|
| std::vector<std::string> getNodes(const std::string& key) { |
| std::vector<std::string> nodes; |
| if (ring_.empty()) return nodes; |
| |
| std::hash<std::string> hasher; |
| uint64_t hash = hasher(key); |
| auto it = ring_.lower_bound(hash); |
| if (it == ring_.end()) it = ring_.begin(); |
| |
| std::set<std::string> seen; |
| for (int i = 0; i < replication_factor_ && seen.size() < (size_t)replication_factor_; ++i) { |
| if (seen.find(it->second) == seen.end()) { |
| nodes.push_back(it->second); |
| seen.insert(it->second); |
| } |
| ++it; |
| if (it == ring_.end()) it = ring_.begin(); |
| } |
| return nodes; |
| } |
| }; |
|
|
| class SegmentedLRU { |
| struct Entry { |
| std::string key; |
| std::vector<uint8_t> value; |
| uint64_t version; |
| std::chrono::steady_clock::time_point expire_at; |
| size_t access_count; |
| int tier; |
| }; |
| |
| std::list<Entry> protected_; |
| std::list<Entry> probationary_; |
| std::unordered_map<std::string, std::list<Entry>::iterator> index_; |
| size_t protected_capacity_; |
| size_t probationary_capacity_; |
| |
| public: |
| std::optional<Entry> get(const std::string& key) { |
| auto it = index_.find(key); |
| if (it == index_.end()) return std::nullopt; |
| |
| Entry& e = *(it->second); |
| if (std::chrono::steady_clock::now() > e.expire_at) { |
| erase(key); |
| return std::nullopt; |
| } |
| |
| e.access_count++; |
| if (e.tier == 0 && e.access_count > 1) { |
| promote(it->second); |
| } |
| return e; |
| } |
| |
| void promote(std::list<Entry>::iterator it) { |
| Entry e = std::move(*it); |
| e.tier = 1; |
| probationary_.erase(it); |
| |
| while (protected_.size() >= protected_capacity_) { |
| Entry victim = std::move(protected_.back()); |
| victim.tier = 0; |
| victim.access_count = 1; |
| protected_.pop_back(); |
| probationary_.push_front(std::move(victim)); |
| index_[probationary_.front().key] = probationary_.begin(); |
| } |
| |
| protected_.push_front(std::move(e)); |
| index_[protected_.front().key] = protected_.begin(); |
| } |
| }; |
|
|