# AICL Example: Distributed Cache # Implements a distributed cache with consistent hashing, replication, eviction policies, # cache invalidation, read-through/write-through patterns, and backfill strategies. # Level 1: Architecture Goal: Provide a low-latency, highly available distributed caching layer that supports consistent hashing for horizontal scaling, multi-region replication, configurable eviction policies, and transparent read-through/write-through integration with backend data stores. Constraint: Cache consistency must follow eventual consistency model with configurable max staleness bound Constraint: Consistent hashing ring must distribute keys within 10% of uniform distribution Constraint: Replication factor must ensure data survives any single node failure Constraint: Eviction must occur before memory usage exceeds 95% of allocated capacity Constraint: All cache operations must complete within 5ms P99 at the local region Risk: Cache stampede causing thundering herd on cache miss Recovery: Implement request coalescing for in-flight key fetches; only one request per key fetches from backend; other concurrent requests wait on shared future; probabilistic early expiration to stagger re-fetches Risk: Hotspot key overwhelming single node despite consistent hashing Recovery: Implement virtual nodes (150+ per physical node) for better distribution; for extreme hotspots, replicate hot keys to multiple nodes using read replicas; client-side local caching for top-K hot keys Risk: Network partition causing cache inconsistency between regions Recovery: Version vectors track causality; conflict resolution using last-write-wins with synchronized clocks; on partition heal, anti-entropy protocol reconciles divergent entries using version comparison; tombstones for deletions Risk: Node failure causing temporary data loss and increased backend load Recovery: Replicated copies on adjacent ring positions serve reads; background re-replication fills new node from replicas; client retry with replica read on primary failure; circuit breaker prevents backend cascade Risk: Memory pressure causing unpredictable eviction of recently cached data Recovery: Segmented LRU (protected/probationary) with admission window; items must prove value before entering protected segment; dynamic pool sizing based on hit rate feedback; OOM protection via hard eviction threshold Risk: Consistent hashing rebalancing causing massive key migration on topology change Recovery: Use bounded loads consistent hashing to limit key movement; vnodes ensure only affected partitions move; rebalancing throttled to prevent network saturation; clients update ring topology lazily Layer: CacheCore SubLayer: HashRing SubLayer: StorageEngine SubLayer: EvictionManager SubLayer: ReplicationManager Layer: ClientProtocol SubLayer: ReadThrough SubLayer: WriteThrough SubLayer: CacheAside SubLayer: Invalidation Layer: Consistency SubLayer: VersionTracking SubLayer: AntiEntropy SubLayer: ConflictResolution Validation: Consistent hashing ring must have at least 150 virtual nodes per physical node Validation: Replication factor must be at least 2 for any write-acknowledged key Validation: Eviction policy must protect recently admitted items from immediate re-eviction Validation: Read-through callback must have a timeout of at most 500ms Validation: Cache invalidation must propagate to all replicas within the consistency window Validation: Memory usage must not exceed 95% before triggering proactive eviction # Level 2: Entities Entity CacheNode nodeId: string host: string port: integer region: string virtualNodeCount: integer memoryUsedBytes: integer memoryTotalBytes: integer cpuUtilization: float status: string lastHeartbeat: datetime Entity CacheEntry key: string value: bytes version: integer regionVersionVector: dict createdAt: datetime accessedAt: datetime ttlSeconds: integer evictionTier: string accessCount: integer sizeBytes: integer Entity HashRing ringId: string nodes: list virtualNodes: dict replicationFactor: integer totalSlots: integer version: integer lastRebalanced: datetime maxKeysPerVnode: integer Entity EvictionPolicy policyName: string protectedRatio: float probationaryRatio: float admissionWindow: integer maxEntrySizeBytes: integer dynamicAdjustmentEnabled: boolean hitRateTarget: float currentHitRate: float Entity ReplicationConfig replicationFactor: integer writeQuorum: integer readQuorum: integer consistencyLevel: string replicationLagThresholdMs: integer asyncReplicationBatchSize: integer conflictResolutionStrategy: string Entity InvalidationMessage invalidationId: string key: string keyPattern: string originNode: string reason: string version: integer timestamp: datetime targetRegions: list propagated: boolean # Level 3: Behaviors Behavior GetEntry Input: key: string, consistencyLevel: string, timeoutMs: integer Output: value: bytes, version: integer, hit: boolean, sourceNode: string Action: Hash key to locate primary node on the ring If local hit, return value from storage engine If miss, check replica nodes based on consistency level If read-through enabled, coalesce with in-flight fetches for same key Fetch from backend if all cache nodes miss; store result on primary Apply TTL check and return stale-if-valid if within grace period Update access metadata for eviction scoring Behavior PutEntry Input: key: string, value: bytes, ttlSeconds: integer, writeThrough: boolean, consistencyLevel: string Output: version: integer, replicatedNodes: list, latencyMs: float Action: Hash key to determine primary and replica nodes Check admission policy before accepting entry Write to primary node storage engine Replicate to N-1 replica nodes based on replication factor Await write quorum acknowledgment before responding If write-through enabled, synchronously write to backend data store Invalidate stale copies on non-primary nodes Return version and replication confirmation Behavior InvalidateKey Input: key: string, pattern: string, reason: string, synchronous: boolean Output: invalidatedNodes: list, invalidationId: string Action: Determine all nodes holding the key or keys matching pattern Delete entry from primary and all replica nodes Publish invalidation message to cross-region replication channels If synchronous, await acknowledgment from all target nodes If asynchronous, fire-and-forget with best-effort delivery Log invalidation for audit trail Emit invalidation metric with latency and scope Behavior RebalanceRing Input: addedNodes: list, removedNodes: list Output: migratedKeys: integer, durationMs: float, newVersion: integer Action: Compute new ring topology with virtual node assignments Identify keys that must migrate to new primary nodes Transfer key ranges to new owners with streaming Verify data integrity on destination nodes Update routing table atomically across all clients Throttle migration rate to avoid impacting production traffic Increment ring version for client topology refresh Behavior EvictEntries Input: targetMemoryBytes: integer, policy: EvictionPolicy Output: evictedCount: integer, freedBytes: integer Action: Scan probationary segment of segmented LRU first Apply admission filter to protect frequently accessed items If probationary insufficient, evict from protected segment's tail Respect TTL-based expiration before policy-based eviction Update hit rate metrics after each eviction batch Dynamically adjust protected/probationary ratio based on hit rate Emit eviction count and memory metrics Behavior ResolveConflict Input: key: string, localEntry: CacheEntry, remoteEntry: CacheEntry Output: resolvedEntry: CacheEntry, resolution: string Action: Compare version vectors for causal ordering If one version dominates, use that entry as resolved If concurrent versions, apply conflict resolution strategy For last-write-wins, compare timestamps from synchronized clocks For merge, combine non-overlapping fields if applicable Write resolved entry and propagate anti-entropy update Log conflict for analysis of consistency model violations # Level 4: Conditions Condition: MemoryThresholdBreached When cache node memory usage exceeds 90% of allocated capacity Then trigger proactive eviction of probationary tier entries; if insufficient, evict from protected tier; throttle new admissions until usage drops below 85%; emit critical memory metric Condition: ReplicationLagExceeded When replica falls behind primary by more than replicationLagThresholdMs Then mark replica as stale for read quorum calculations; redirect reads to healthy replicas; trigger catchup replication stream; if catchup fails within 5 minutes, mark replica as degraded Condition: ConsistentHashImbalance When any node's key count deviates more than 15% from the average Then redistribute virtual nodes to balance load; consider adding virtual nodes to underloaded physical nodes; emit imbalance metric; trigger background rebalancing Condition: BackendUnreachable When read-through or write-through backend becomes unreachable Then serve stale cache entries beyond TTL within grace period; queue write-through writes for async backend delivery; circuit break backend calls after 5 consecutive failures; emit backend unavailability alert Condition: StampedeDetected When more than 10 concurrent requests miss the same key within 100ms Then coalesce all requests into single backend fetch; return shared future result to all waiters; implement probabilistic early expiration (PER) to stagger future re-fetches; emit stampede metric # Level 5: Events Event: OnNodeJoined On new cache node added to the cluster Action: Register node on hash ring, assign virtual nodes, begin key migration from overloaded neighbors, update client routing tables, start replication streams, emit node-join metric Event: OnNodeFailed On cache node declared failed after missing heartbeats Action: Remove from hash ring, promote replicas to primary for affected keys, trigger re-replication to restore replication factor, update client routing tables, begin data recovery from remaining replicas Event: OnCacheHitRateDrop On global cache hit rate drops below configured target for 5 consecutive minutes Action: Analyze eviction patterns, increase protected segment ratio, adjust admission policy, identify churn patterns, consider increasing cache capacity, emit hit rate degradation alert Event: OnInvalidationBroadcast On invalidation message received from another region Action: Apply invalidation to local storage, acknowledge if synchronous, update version vector, propagate to any downstream replicas, emit cross-region invalidation metric Event: OnRebalanceComplete On ring rebalancing migration finishes Action: Verify all key ranges are correctly placed, update global routing table, validate replication factor for all keys, emit rebalance completion metric with migration stats, enable full traffic on new nodes # Level 6: Concurrency Parallel: Independent cache get/put operations across different key partitions Background replication streams to replica nodes Anti-entropy reconciliation between regions Eviction scanning and memory management Client routing table updates # Level 7: Optimization Optimize: Cache hit rate Priority: Use segmented LRU with dynamic admission policy; implement probabilistic early expiration to prevent stampedes; prefetch keys with predictable access patterns; adjust protected/probationary ratio based on real-time hit rate Optimize: Replication throughput Priority: Batch replication updates into pipeline messages; use async replication for non-critical consistency levels; compress replication payloads with lz4; parallelize replication to independent replicas Optimize: Memory utilization Priority: Use slab allocation for common value sizes; compress cold entries with zstd; implement object pooling for cache entry metadata; memory-mapped storage for items exceeding threshold size # Level 8: Learning Learn: Optimal TTL for different key patterns Goal: Maximize hit rate while minimizing stale data served Adapt: default TTL per key prefix pattern Based: Access frequency, backend update frequency, and staleness tolerance observed over 7-day windows Learn: Admission policy thresholds Goal: Prevent cache pollution from one-time access keys while retaining valuable entries Adapt: admissionWindow and minimumAccessCount for protected tier Based: Historical eviction patterns showing recently-evicted items being re-requested vs. truly cold items Learn: Replication factor per key namespace Goal: Use higher replication for hot keys and lower replication for cold keys to optimize memory Adapt: dynamicReplicationFactor per key namespace Based: Access frequency distribution, read quorum requirements, and node failure probability # Level 9: Security Security: Encrypt: All inter-node replication traffic using TLS 1.3 Encrypt: Client-to-cache connections using TLS with SASL authentication Encrypt: Cache entry values at rest using AES-256-GCM for sensitive key prefixes Protect: Key-level access control via ACLs (read, write, admin) Protect: Admin operations (flush, rebalance, config change) via RBAC Protect: Against cache poisoning by validating write authorization per namespace Protect: Against side-channel attacks via constant-time key comparison for authentication # Level 10: Native Native: C++ { class ConsistentHashRing { std::map ring_; std::unordered_map> vnodes_; int replication_factor_; int vnodes_per_node_; public: void addNode(const std::string& node_id) { std::hash hasher; for (int i = 0; i < vnodes_per_node_; ++i) { std::string vnode_key = node_id + ":" + std::to_string(i); uint64_t hash = hasher(vnode_key); ring_[hash] = node_id; vnodes_[node_id].push_back(hash); } } void removeNode(const std::string& node_id) { for (uint64_t hash : vnodes_[node_id]) { ring_.erase(hash); } vnodes_.erase(node_id); } std::vector getNodes(const std::string& key) { std::vector nodes; if (ring_.empty()) return nodes; std::hash hasher; uint64_t hash = hasher(key); auto it = ring_.lower_bound(hash); if (it == ring_.end()) it = ring_.begin(); std::set seen; for (int i = 0; i < replication_factor_ && seen.size() < (size_t)replication_factor_; ++i) { if (seen.find(it->second) == seen.end()) { nodes.push_back(it->second); seen.insert(it->second); } ++it; if (it == ring_.end()) it = ring_.begin(); } return nodes; } }; class SegmentedLRU { struct Entry { std::string key; std::vector value; uint64_t version; std::chrono::steady_clock::time_point expire_at; size_t access_count; int tier; // 0=probationary, 1=protected }; std::list protected_; std::list probationary_; std::unordered_map::iterator> index_; size_t protected_capacity_; size_t probationary_capacity_; public: std::optional get(const std::string& key) { auto it = index_.find(key); if (it == index_.end()) return std::nullopt; Entry& e = *(it->second); if (std::chrono::steady_clock::now() > e.expire_at) { erase(key); return std::nullopt; } e.access_count++; if (e.tier == 0 && e.access_count > 1) { promote(it->second); } return e; } void promote(std::list::iterator it) { Entry e = std::move(*it); e.tier = 1; probationary_.erase(it); while (protected_.size() >= protected_capacity_) { Entry victim = std::move(protected_.back()); victim.tier = 0; victim.access_count = 1; protected_.pop_back(); probationary_.push_front(std::move(victim)); index_[probationary_.front().key] = probationary_.begin(); } protected_.push_front(std::move(e)); index_[protected_.front().key] = protected_.begin(); } };