AFML / afml /cache /Cache System Migration Guide.md
akshayboora's picture
Upload 940 files
669d6a1 verified
# Cache System Migration Guide
## 🎯 TL;DR - What Changed
**Auto-versioning is now ENABLED BY DEFAULT.**
Your cache will automatically invalidate when function code changes. This prevents stale cache bugs.
**Most users need to do nothing** - just update and enjoy automatic cache invalidation.
**Only opt-out if:**
- Function takes hours/days to compute AND
- Function is stable/won't change AND
- You understand the risk of stale results
---
## Overview of Changes
1. **`auto_versioning=True` by default**: Cache keys include function source hash
2. **One decorator to rule them all**: `@cacheable()` replaces multiple decorators
3. **Removed `smart_cacheable`**: Now redundant (built into default behavior)
4. **Selective cleaner refocused**: Maintenance tool for orphaned caches
---
## Quick Migration Table
| Old Code | New Code | Notes |
|----------|----------|-------|
| `@robust_cacheable` | `@cacheable()` | Now has auto-versioning by default |
| `@time_aware_cacheable` | `@cacheable(time_aware=True)` | Now has auto-versioning by default |
| `@cv_cacheable` | `@cacheable()` | Now has auto-versioning by default |
| `@smart_cacheable` | `@cacheable()` | **REMOVED - now default behavior** |
| `@cacheable()` (old) | `@cacheable(auto_versioning=False)` | **Only if you need old behavior** |
---
## What is Auto-Versioning?
### The Problem It Solves
```python
# Without auto-versioning
@cacheable(auto_versioning=False)
def calculate_returns(prices):
return prices.pct_change()
calculate_returns(df) # Cache miss, stores result
calculate_returns(df) # Cache hit ✓
# Developer fixes a bug
def calculate_returns(prices):
return prices.pct_change().fillna(0) # Bug fix!
calculate_returns(df) # Cache HIT - WRONG RESULT! ❌
# Returns OLD buggy result from cache
```
### With Auto-Versioning (Now Default)
```python
# With auto-versioning (NEW DEFAULT)
@cacheable() # auto_versioning=True by default
def calculate_returns(prices):
return prices.pct_change()
calculate_returns(df) # Cache miss, stores at key "v_abc123..."
calculate_returns(df) # Cache hit ✓
# Developer fixes bug
def calculate_returns(prices):
return prices.pct_change().fillna(0) # Bug fix!
calculate_returns(df) # Cache MISS - new key "v_def456..." ✓
# Computes with NEW correct code
```
---
## Migration Steps
### Step 1: Update `smart_cacheable` (REQUIRED)
**Old code:**
```python
from afml.cache import smart_cacheable
@smart_cacheable
def my_function(data):
return data.mean()
```
**New code:**
```python
from afml.cache import cacheable
@cacheable() # That's it! auto_versioning is now default
def my_function(data):
return data.mean()
```
### Step 2: Review Expensive Functions (OPTIONAL)
If you have functions that take **hours to compute** and **rarely change**:
```python
@cacheable(auto_versioning=False) # Explicit opt-out
def train_huge_model(data):
"""Takes 48 hours, changes once per year"""
return expensive_training(data)
```
⚠️ **Warning**: With `auto_versioning=False`, adding a comment invalidates cache:
```python
@cacheable(auto_versioning=False)
def train_huge_model(data):
"""Added this docstring""" # THIS CHANGE WON'T INVALIDATE CACHE
return expensive_training(data) # May return stale result!
```
### Step 3: Clean Up Old Caches (RECOMMENDED)
After migration, clean up orphaned caches:
```python
from afml.cache import cache_maintenance
# One-time cleanup after migration
cache_maintenance(
clean_orphaned=True,
max_cache_size_mb=1000,
max_age_days=30
)
```
---
## Understanding Auto-Versioning Behavior
### How Cache Keys Work
**Without auto-versioning:**
```
cache_key = md5("module.function_name" + "arg_hashes")
= "a1b2c3d4..."
```
**With auto-versioning (default):**
```
cache_key = md5("module.function_name" + "v_abc123" + "arg_hashes")
^^^^^^^^^^
function source hash
= "e5f6g7h8..." # Different key!
```
### When Cache Invalidates
Cache invalidates when:
- ✅ Function body changes
- ✅ Function name changes
- ✅ Default parameters change
- ✅ Decorators change
- ❌ Comments change (graceful: uses file mtime as fallback)
- ❌ Docstrings change (graceful: uses file mtime as fallback)
### Graceful Fallback
For built-in/dynamic functions where source is unavailable:
```python
# Can't get source for built-ins
import numpy as np
@cacheable() # Gracefully falls back to file mtime
def use_builtin(data):
return np.mean(data) # np.mean has no source
# Warning logged, but doesn't crash
```
---
## Common Scenarios
### Scenario 1: Development (Default - No Changes Needed)
```python
from afml.cache import cacheable
@cacheable() # Just use defaults!
def my_feature(data, window):
"""Feature under active development"""
return data.rolling(window).mean()
# Work normally - cache auto-invalidates on changes
result1 = my_feature(df, 20)
result2 = my_feature(df, 20) # Cache hit
# ... modify my_feature ...
result3 = my_feature(df, 20) # Cache miss (automatic!)
```
### Scenario 2: Expensive Computation (Explicit Opt-Out)
```python
from afml.cache import cacheable
@cacheable(auto_versioning=False) # Explicit opt-out
def train_production_model(data):
"""Takes 24 hours, changes rarely, want to preserve cache"""
return expensive_training(data)
```
### Scenario 3: Bulk Opt-Out for Stable Functions
```python
from afml.cache import disable_auto_versioning
# Create custom decorator without versioning
cacheable_stable = disable_auto_versioning()
@cacheable_stable()
def stable_func_1(data): ...
@cacheable_stable()
def stable_func_2(data): ...
@cacheable_stable(time_aware=True) # Can combine with other options
def stable_func_3(data): ...
```
### Scenario 4: Mixed Strategy
```python
from afml.cache import cacheable
# Under development - auto-versioning
@cacheable()
def experimental_feature(data):
return data.ewm(span=20).mean()
# Production stable - opt-out
@cacheable(auto_versioning=False)
def load_data(symbol, start, end):
return expensive_data_load(symbol, start, end)
```
---
## Maintenance & Cleanup
### Periodic Cleanup (Recommended)
Set up weekly/monthly cleanup:
```python
from afml.cache import cache_maintenance
# Run weekly via cron/scheduler
cache_maintenance(
clean_orphaned=True, # Remove old function versions
max_cache_size_mb=2000, # Enforce size limit
max_age_days=90, # Remove very old caches
min_orphan_age_hours=48 # Keep recent orphans (grace period)
)
```
### Analyze Cache Fragmentation
Check if auto-versioning is creating too many versions:
```python
from afml.cache import print_version_analysis
print_version_analysis()
# Output:
# ========================================
# CACHE VERSION ANALYSIS
# ========================================
# Functions with versions: 12
# Total versions: 34
# Total size: 1.2 GB
#
# Top fragmented functions:
# 1. calculate_feature
# Versions: 8
# Size: 450 MB
```
If fragmentation is high, consider opting out for those functions.
---
## Performance Implications
### Overhead of Auto-Versioning
**Minimal overhead** - hash computed once at decorator application:
```python
# Old smart_cacheable: 0.5ms PER CALL
@smart_cacheable # Read source + hash on EVERY call
def fast_func(x):
return x + 1
# New auto_versioning: 0ms per call
@cacheable() # Hash computed ONCE at import time
def fast_func(x):
return x + 1
```
### Storage Implications
With auto-versioning, multiple versions can coexist temporarily:
```bash
cache/
my_module/
my_function/
v_abc123_args_xyz/ # Version 1 (orphaned)
v_def456_args_xyz/ # Version 2 (current)
v_ghi789_args_xyz/ # Version 3 (current)
```
**Mitigation**: Run `cache_maintenance()` periodically to clean orphans.
---
## Testing Your Migration
### 1. Check for `smart_cacheable` usage
```bash
# This should find zero results after migration
grep -r "smart_cacheable" your_project/
```
### 2. Test auto-versioning behavior
```python
from afml.cache import cacheable
@cacheable()
def test_func(x):
return x * 2
# First call
result1 = test_func(5) # Cache miss
# Second call (should hit)
result2 = test_func(5) # Cache hit
# Change function
def test_func(x):
return x * 3 # Changed!
# Third call (should miss due to version change)
result3 = test_func(5) # Cache miss (automatic!)
assert result3 == 15 # New result
```
### 3. Verify cleanup works
```python
from afml.cache import find_orphaned_caches
orphans = find_orphaned_caches()
print(f"Found {orphans['orphaned_count']} orphaned caches")
print(f"Total size: {orphans['total_size_mb']} MB")
```
---
## Troubleshooting
### Issue: Cache not invalidating on changes
**Cause**: Function source unavailable (built-in/dynamic)
**Solution**: Check logs for warnings:
```python
# Look for:
# "Cannot hash source for my_func, using file mtime for versioning"
```
If file mtime also fails, explicitly use `auto_versioning=False` and manage manually.
### Issue: Too many cache versions
**Cause**: Rapid development with many changes
**Solution**: Run cleanup more frequently:
```python
from afml.cache import cache_maintenance
cache_maintenance(
clean_orphaned=True,
min_orphan_age_hours=12 # More aggressive
)
```
### Issue: Expensive function cache lost
**Cause**: Auto-versioning invalidated cache on minor change
**Solution**: Opt-out for that specific function:
```python
@cacheable(auto_versioning=False)
def expensive_stable_function(data):
return days_of_computation(data)
```
---
## Backward Compatibility
### Old Decorator Aliases
These still work (no changes needed):
```python
from afml.cache import (
robust_cacheable, # = cacheable()
time_aware_cacheable, # = cacheable(time_aware=True)
cv_cacheable, # = cacheable()
)
# All now have auto_versioning=True by default
```
### Disabling Auto-Versioning Globally
If you want old behavior everywhere (not recommended):
```python
# In your __init__.py or main module
from afml.cache import disable_auto_versioning
# Use this instead of cacheable
cacheable = disable_auto_versioning()
# Now all @cacheable() calls have auto_versioning=False
```
---
## Getting Help
### Check Cache Health
```python
from afml.cache import print_cache_report
print_cache_report()
```
### Debug Specific Function
```python
from afml.cache import debug_function_cache
debug_function_cache("afml.features.my_func")
```
### Analyze Version Fragmentation
```python
from afml.cache import analyze_cache_versions, print_version_analysis
analysis = analyze_cache_versions()
print_version_analysis()
```
---
## Summary
**What You Need to Do:**
1. Replace `@smart_cacheable` with `@cacheable()` (required)
2. Review expensive functions and opt-out if needed (optional)
3. Set up periodic cache maintenance (recommended)
**What's Better Now:**
- Automatic cache invalidation on code changes (correctness)
- No per-call overhead (performance)
- Complete invalidation for all args (reliability)
- Simpler mental model (clarity)
**Default is Correct:**
- `auto_versioning=True` prevents stale cache bugs
- Only opt-out for specific expensive stable functions
- When in doubt, use the default