# Cache System Migration Guide ## 🎯 TL;DR - What Changed **Auto-versioning is now ENABLED BY DEFAULT.** Your cache will automatically invalidate when function code changes. This prevents stale cache bugs. **Most users need to do nothing** - just update and enjoy automatic cache invalidation. **Only opt-out if:** - Function takes hours/days to compute AND - Function is stable/won't change AND - You understand the risk of stale results --- ## Overview of Changes 1. **`auto_versioning=True` by default**: Cache keys include function source hash 2. **One decorator to rule them all**: `@cacheable()` replaces multiple decorators 3. **Removed `smart_cacheable`**: Now redundant (built into default behavior) 4. **Selective cleaner refocused**: Maintenance tool for orphaned caches --- ## Quick Migration Table | Old Code | New Code | Notes | |----------|----------|-------| | `@robust_cacheable` | `@cacheable()` | Now has auto-versioning by default | | `@time_aware_cacheable` | `@cacheable(time_aware=True)` | Now has auto-versioning by default | | `@cv_cacheable` | `@cacheable()` | Now has auto-versioning by default | | `@smart_cacheable` | `@cacheable()` | **REMOVED - now default behavior** | | `@cacheable()` (old) | `@cacheable(auto_versioning=False)` | **Only if you need old behavior** | --- ## What is Auto-Versioning? ### The Problem It Solves ```python # Without auto-versioning @cacheable(auto_versioning=False) def calculate_returns(prices): return prices.pct_change() calculate_returns(df) # Cache miss, stores result calculate_returns(df) # Cache hit ✓ # Developer fixes a bug def calculate_returns(prices): return prices.pct_change().fillna(0) # Bug fix! calculate_returns(df) # Cache HIT - WRONG RESULT! ❌ # Returns OLD buggy result from cache ``` ### With Auto-Versioning (Now Default) ```python # With auto-versioning (NEW DEFAULT) @cacheable() # auto_versioning=True by default def calculate_returns(prices): return prices.pct_change() calculate_returns(df) # Cache miss, stores at key "v_abc123..." calculate_returns(df) # Cache hit ✓ # Developer fixes bug def calculate_returns(prices): return prices.pct_change().fillna(0) # Bug fix! calculate_returns(df) # Cache MISS - new key "v_def456..." ✓ # Computes with NEW correct code ``` --- ## Migration Steps ### Step 1: Update `smart_cacheable` (REQUIRED) **Old code:** ```python from afml.cache import smart_cacheable @smart_cacheable def my_function(data): return data.mean() ``` **New code:** ```python from afml.cache import cacheable @cacheable() # That's it! auto_versioning is now default def my_function(data): return data.mean() ``` ### Step 2: Review Expensive Functions (OPTIONAL) If you have functions that take **hours to compute** and **rarely change**: ```python @cacheable(auto_versioning=False) # Explicit opt-out def train_huge_model(data): """Takes 48 hours, changes once per year""" return expensive_training(data) ``` ⚠️ **Warning**: With `auto_versioning=False`, adding a comment invalidates cache: ```python @cacheable(auto_versioning=False) def train_huge_model(data): """Added this docstring""" # THIS CHANGE WON'T INVALIDATE CACHE return expensive_training(data) # May return stale result! ``` ### Step 3: Clean Up Old Caches (RECOMMENDED) After migration, clean up orphaned caches: ```python from afml.cache import cache_maintenance # One-time cleanup after migration cache_maintenance( clean_orphaned=True, max_cache_size_mb=1000, max_age_days=30 ) ``` --- ## Understanding Auto-Versioning Behavior ### How Cache Keys Work **Without auto-versioning:** ``` cache_key = md5("module.function_name" + "arg_hashes") = "a1b2c3d4..." ``` **With auto-versioning (default):** ``` cache_key = md5("module.function_name" + "v_abc123" + "arg_hashes") ^^^^^^^^^^ function source hash = "e5f6g7h8..." # Different key! ``` ### When Cache Invalidates Cache invalidates when: - ✅ Function body changes - ✅ Function name changes - ✅ Default parameters change - ✅ Decorators change - ❌ Comments change (graceful: uses file mtime as fallback) - ❌ Docstrings change (graceful: uses file mtime as fallback) ### Graceful Fallback For built-in/dynamic functions where source is unavailable: ```python # Can't get source for built-ins import numpy as np @cacheable() # Gracefully falls back to file mtime def use_builtin(data): return np.mean(data) # np.mean has no source # Warning logged, but doesn't crash ``` --- ## Common Scenarios ### Scenario 1: Development (Default - No Changes Needed) ```python from afml.cache import cacheable @cacheable() # Just use defaults! def my_feature(data, window): """Feature under active development""" return data.rolling(window).mean() # Work normally - cache auto-invalidates on changes result1 = my_feature(df, 20) result2 = my_feature(df, 20) # Cache hit # ... modify my_feature ... result3 = my_feature(df, 20) # Cache miss (automatic!) ``` ### Scenario 2: Expensive Computation (Explicit Opt-Out) ```python from afml.cache import cacheable @cacheable(auto_versioning=False) # Explicit opt-out def train_production_model(data): """Takes 24 hours, changes rarely, want to preserve cache""" return expensive_training(data) ``` ### Scenario 3: Bulk Opt-Out for Stable Functions ```python from afml.cache import disable_auto_versioning # Create custom decorator without versioning cacheable_stable = disable_auto_versioning() @cacheable_stable() def stable_func_1(data): ... @cacheable_stable() def stable_func_2(data): ... @cacheable_stable(time_aware=True) # Can combine with other options def stable_func_3(data): ... ``` ### Scenario 4: Mixed Strategy ```python from afml.cache import cacheable # Under development - auto-versioning @cacheable() def experimental_feature(data): return data.ewm(span=20).mean() # Production stable - opt-out @cacheable(auto_versioning=False) def load_data(symbol, start, end): return expensive_data_load(symbol, start, end) ``` --- ## Maintenance & Cleanup ### Periodic Cleanup (Recommended) Set up weekly/monthly cleanup: ```python from afml.cache import cache_maintenance # Run weekly via cron/scheduler cache_maintenance( clean_orphaned=True, # Remove old function versions max_cache_size_mb=2000, # Enforce size limit max_age_days=90, # Remove very old caches min_orphan_age_hours=48 # Keep recent orphans (grace period) ) ``` ### Analyze Cache Fragmentation Check if auto-versioning is creating too many versions: ```python from afml.cache import print_version_analysis print_version_analysis() # Output: # ======================================== # CACHE VERSION ANALYSIS # ======================================== # Functions with versions: 12 # Total versions: 34 # Total size: 1.2 GB # # Top fragmented functions: # 1. calculate_feature # Versions: 8 # Size: 450 MB ``` If fragmentation is high, consider opting out for those functions. --- ## Performance Implications ### Overhead of Auto-Versioning **Minimal overhead** - hash computed once at decorator application: ```python # Old smart_cacheable: 0.5ms PER CALL @smart_cacheable # Read source + hash on EVERY call def fast_func(x): return x + 1 # New auto_versioning: 0ms per call @cacheable() # Hash computed ONCE at import time def fast_func(x): return x + 1 ``` ### Storage Implications With auto-versioning, multiple versions can coexist temporarily: ```bash cache/ my_module/ my_function/ v_abc123_args_xyz/ # Version 1 (orphaned) v_def456_args_xyz/ # Version 2 (current) v_ghi789_args_xyz/ # Version 3 (current) ``` **Mitigation**: Run `cache_maintenance()` periodically to clean orphans. --- ## Testing Your Migration ### 1. Check for `smart_cacheable` usage ```bash # This should find zero results after migration grep -r "smart_cacheable" your_project/ ``` ### 2. Test auto-versioning behavior ```python from afml.cache import cacheable @cacheable() def test_func(x): return x * 2 # First call result1 = test_func(5) # Cache miss # Second call (should hit) result2 = test_func(5) # Cache hit # Change function def test_func(x): return x * 3 # Changed! # Third call (should miss due to version change) result3 = test_func(5) # Cache miss (automatic!) assert result3 == 15 # New result ``` ### 3. Verify cleanup works ```python from afml.cache import find_orphaned_caches orphans = find_orphaned_caches() print(f"Found {orphans['orphaned_count']} orphaned caches") print(f"Total size: {orphans['total_size_mb']} MB") ``` --- ## Troubleshooting ### Issue: Cache not invalidating on changes **Cause**: Function source unavailable (built-in/dynamic) **Solution**: Check logs for warnings: ```python # Look for: # "Cannot hash source for my_func, using file mtime for versioning" ``` If file mtime also fails, explicitly use `auto_versioning=False` and manage manually. ### Issue: Too many cache versions **Cause**: Rapid development with many changes **Solution**: Run cleanup more frequently: ```python from afml.cache import cache_maintenance cache_maintenance( clean_orphaned=True, min_orphan_age_hours=12 # More aggressive ) ``` ### Issue: Expensive function cache lost **Cause**: Auto-versioning invalidated cache on minor change **Solution**: Opt-out for that specific function: ```python @cacheable(auto_versioning=False) def expensive_stable_function(data): return days_of_computation(data) ``` --- ## Backward Compatibility ### Old Decorator Aliases These still work (no changes needed): ```python from afml.cache import ( robust_cacheable, # = cacheable() time_aware_cacheable, # = cacheable(time_aware=True) cv_cacheable, # = cacheable() ) # All now have auto_versioning=True by default ``` ### Disabling Auto-Versioning Globally If you want old behavior everywhere (not recommended): ```python # In your __init__.py or main module from afml.cache import disable_auto_versioning # Use this instead of cacheable cacheable = disable_auto_versioning() # Now all @cacheable() calls have auto_versioning=False ``` --- ## Getting Help ### Check Cache Health ```python from afml.cache import print_cache_report print_cache_report() ``` ### Debug Specific Function ```python from afml.cache import debug_function_cache debug_function_cache("afml.features.my_func") ``` ### Analyze Version Fragmentation ```python from afml.cache import analyze_cache_versions, print_version_analysis analysis = analyze_cache_versions() print_version_analysis() ``` --- ## Summary ✅ **What You Need to Do:** 1. Replace `@smart_cacheable` with `@cacheable()` (required) 2. Review expensive functions and opt-out if needed (optional) 3. Set up periodic cache maintenance (recommended) ✅ **What's Better Now:** - Automatic cache invalidation on code changes (correctness) - No per-call overhead (performance) - Complete invalidation for all args (reliability) - Simpler mental model (clarity) ✅ **Default is Correct:** - `auto_versioning=True` prevents stale cache bugs - Only opt-out for specific expensive stable functions - When in doubt, use the default