# Cache System Migration Guide

## 🎯 TL;DR - What Changed

**Auto-versioning is now ENABLED BY DEFAULT.**

Your cache will automatically invalidate when function code changes. This prevents stale cache bugs.

**Most users need to do nothing** - just update and enjoy automatic cache invalidation.

**Only opt-out if:**

- Function takes hours/days to compute AND
- Function is stable/won't change AND  
- You understand the risk of stale results

---

## Overview of Changes

1. **`auto_versioning=True` by default**: Cache keys include function source hash
2. **One decorator to rule them all**: `@cacheable()` replaces multiple decorators
3. **Removed `smart_cacheable`**: Now redundant (built into default behavior)
4. **Selective cleaner refocused**: Maintenance tool for orphaned caches

---

## Quick Migration Table

| Old Code | New Code | Notes |
|----------|----------|-------|
| `@robust_cacheable` | `@cacheable()` | Now has auto-versioning by default |
| `@time_aware_cacheable` | `@cacheable(time_aware=True)` | Now has auto-versioning by default |
| `@cv_cacheable` | `@cacheable()` | Now has auto-versioning by default |
| `@smart_cacheable` | `@cacheable()` | **REMOVED - now default behavior** |
| `@cacheable()` (old) | `@cacheable(auto_versioning=False)` | **Only if you need old behavior** |

---

## What is Auto-Versioning?

### The Problem It Solves

```python
# Without auto-versioning
@cacheable(auto_versioning=False)
def calculate_returns(prices):
    return prices.pct_change()

calculate_returns(df)  # Cache miss, stores result
calculate_returns(df)  # Cache hit ✓

# Developer fixes a bug
def calculate_returns(prices):
    return prices.pct_change().fillna(0)  # Bug fix!

calculate_returns(df)  # Cache HIT - WRONG RESULT! ❌
# Returns OLD buggy result from cache
```

### With Auto-Versioning (Now Default)

```python
# With auto-versioning (NEW DEFAULT)
@cacheable()  # auto_versioning=True by default
def calculate_returns(prices):
    return prices.pct_change()

calculate_returns(df)  # Cache miss, stores at key "v_abc123..."
calculate_returns(df)  # Cache hit ✓

# Developer fixes bug
def calculate_returns(prices):
    return prices.pct_change().fillna(0)  # Bug fix!

calculate_returns(df)  # Cache MISS - new key "v_def456..." ✓
# Computes with NEW correct code
```

---

## Migration Steps

### Step 1: Update `smart_cacheable` (REQUIRED)

**Old code:**

```python
from afml.cache import smart_cacheable

@smart_cacheable
def my_function(data):
    return data.mean()
```

**New code:**

```python
from afml.cache import cacheable

@cacheable()  # That's it! auto_versioning is now default
def my_function(data):
    return data.mean()
```

### Step 2: Review Expensive Functions (OPTIONAL)

If you have functions that take **hours to compute** and **rarely change**:

```python
@cacheable(auto_versioning=False)  # Explicit opt-out
def train_huge_model(data):
    """Takes 48 hours, changes once per year"""
    return expensive_training(data)
```

⚠️ **Warning**: With `auto_versioning=False`, adding a comment invalidates cache:

```python
@cacheable(auto_versioning=False)
def train_huge_model(data):
    """Added this docstring"""  # THIS CHANGE WON'T INVALIDATE CACHE
    return expensive_training(data)  # May return stale result!
```

### Step 3: Clean Up Old Caches (RECOMMENDED)

After migration, clean up orphaned caches:

```python
from afml.cache import cache_maintenance

# One-time cleanup after migration
cache_maintenance(
    clean_orphaned=True,
    max_cache_size_mb=1000,
    max_age_days=30
)
```

---

## Understanding Auto-Versioning Behavior

### How Cache Keys Work

**Without auto-versioning:**

```
cache_key = md5("module.function_name" + "arg_hashes")
           = "a1b2c3d4..."
```

**With auto-versioning (default):**

```
cache_key = md5("module.function_name" + "v_abc123" + "arg_hashes")
                                         ^^^^^^^^^^
                                    function source hash
           = "e5f6g7h8..."  # Different key!
```

### When Cache Invalidates

Cache invalidates when:

- ✅ Function body changes
- ✅ Function name changes  
- ✅ Default parameters change
- ✅ Decorators change
- ❌ Comments change (graceful: uses file mtime as fallback)
- ❌ Docstrings change (graceful: uses file mtime as fallback)

### Graceful Fallback

For built-in/dynamic functions where source is unavailable:

```python
# Can't get source for built-ins
import numpy as np

@cacheable()  # Gracefully falls back to file mtime
def use_builtin(data):
    return np.mean(data)  # np.mean has no source

# Warning logged, but doesn't crash
```

---

## Common Scenarios

### Scenario 1: Development (Default - No Changes Needed)

```python
from afml.cache import cacheable

@cacheable()  # Just use defaults!
def my_feature(data, window):
    """Feature under active development"""
    return data.rolling(window).mean()

# Work normally - cache auto-invalidates on changes
result1 = my_feature(df, 20)
result2 = my_feature(df, 20)  # Cache hit

# ... modify my_feature ...

result3 = my_feature(df, 20)  # Cache miss (automatic!)
```

### Scenario 2: Expensive Computation (Explicit Opt-Out)

```python
from afml.cache import cacheable

@cacheable(auto_versioning=False)  # Explicit opt-out
def train_production_model(data):
    """Takes 24 hours, changes rarely, want to preserve cache"""
    return expensive_training(data)
```

### Scenario 3: Bulk Opt-Out for Stable Functions

```python
from afml.cache import disable_auto_versioning

# Create custom decorator without versioning
cacheable_stable = disable_auto_versioning()

@cacheable_stable()
def stable_func_1(data): ...

@cacheable_stable()
def stable_func_2(data): ...

@cacheable_stable(time_aware=True)  # Can combine with other options
def stable_func_3(data): ...
```

### Scenario 4: Mixed Strategy

```python
from afml.cache import cacheable

# Under development - auto-versioning
@cacheable()
def experimental_feature(data):
    return data.ewm(span=20).mean()

# Production stable - opt-out
@cacheable(auto_versioning=False)
def load_data(symbol, start, end):
    return expensive_data_load(symbol, start, end)
```

---

## Maintenance & Cleanup

### Periodic Cleanup (Recommended)

Set up weekly/monthly cleanup:

```python
from afml.cache import cache_maintenance

# Run weekly via cron/scheduler
cache_maintenance(
    clean_orphaned=True,      # Remove old function versions
    max_cache_size_mb=2000,   # Enforce size limit
    max_age_days=90,          # Remove very old caches
    min_orphan_age_hours=48   # Keep recent orphans (grace period)
)
```

### Analyze Cache Fragmentation

Check if auto-versioning is creating too many versions:

```python
from afml.cache import print_version_analysis

print_version_analysis()
# Output:
# ========================================
# CACHE VERSION ANALYSIS
# ========================================
# Functions with versions: 12
# Total versions: 34
# Total size: 1.2 GB
# 
# Top fragmented functions:
#   1. calculate_feature
#      Versions: 8
#      Size: 450 MB
```

If fragmentation is high, consider opting out for those functions.

---

## Performance Implications

### Overhead of Auto-Versioning

**Minimal overhead** - hash computed once at decorator application:

```python
# Old smart_cacheable: 0.5ms PER CALL
@smart_cacheable  # Read source + hash on EVERY call
def fast_func(x):
    return x + 1

# New auto_versioning: 0ms per call
@cacheable()  # Hash computed ONCE at import time
def fast_func(x):
    return x + 1
```

### Storage Implications

With auto-versioning, multiple versions can coexist temporarily:

```bash
cache/
  my_module/
    my_function/
      v_abc123_args_xyz/  # Version 1 (orphaned)
      v_def456_args_xyz/  # Version 2 (current)
      v_ghi789_args_xyz/  # Version 3 (current)
```

**Mitigation**: Run `cache_maintenance()` periodically to clean orphans.

---

## Testing Your Migration

### 1. Check for `smart_cacheable` usage

```bash
# This should find zero results after migration
grep -r "smart_cacheable" your_project/
```

### 2. Test auto-versioning behavior

```python
from afml.cache import cacheable

@cacheable()
def test_func(x):
    return x * 2

# First call
result1 = test_func(5)  # Cache miss

# Second call (should hit)
result2 = test_func(5)  # Cache hit

# Change function
def test_func(x):
    return x * 3  # Changed!

# Third call (should miss due to version change)
result3 = test_func(5)  # Cache miss (automatic!)

assert result3 == 15  # New result
```

### 3. Verify cleanup works

```python
from afml.cache import find_orphaned_caches

orphans = find_orphaned_caches()
print(f"Found {orphans['orphaned_count']} orphaned caches")
print(f"Total size: {orphans['total_size_mb']} MB")
```

---

## Troubleshooting

### Issue: Cache not invalidating on changes

**Cause**: Function source unavailable (built-in/dynamic)

**Solution**: Check logs for warnings:

```python
# Look for:
# "Cannot hash source for my_func, using file mtime for versioning"
```

If file mtime also fails, explicitly use `auto_versioning=False` and manage manually.

### Issue: Too many cache versions

**Cause**: Rapid development with many changes

**Solution**: Run cleanup more frequently:

```python
from afml.cache import cache_maintenance

cache_maintenance(
    clean_orphaned=True,
    min_orphan_age_hours=12  # More aggressive
)
```

### Issue: Expensive function cache lost

**Cause**: Auto-versioning invalidated cache on minor change

**Solution**: Opt-out for that specific function:

```python
@cacheable(auto_versioning=False)
def expensive_stable_function(data):
    return days_of_computation(data)
```

---

## Backward Compatibility

### Old Decorator Aliases

These still work (no changes needed):

```python
from afml.cache import (
    robust_cacheable,      # = cacheable()
    time_aware_cacheable,  # = cacheable(time_aware=True)
    cv_cacheable,          # = cacheable()
)

# All now have auto_versioning=True by default
```

### Disabling Auto-Versioning Globally

If you want old behavior everywhere (not recommended):

```python
# In your __init__.py or main module
from afml.cache import disable_auto_versioning

# Use this instead of cacheable
cacheable = disable_auto_versioning()

# Now all @cacheable() calls have auto_versioning=False
```

---

## Getting Help

### Check Cache Health

```python
from afml.cache import print_cache_report
print_cache_report()
```

### Debug Specific Function

```python
from afml.cache import debug_function_cache
debug_function_cache("afml.features.my_func")
```

### Analyze Version Fragmentation

```python
from afml.cache import analyze_cache_versions, print_version_analysis

analysis = analyze_cache_versions()
print_version_analysis()
```

---

## Summary

✅ **What You Need to Do:**

1. Replace `@smart_cacheable` with `@cacheable()` (required)
2. Review expensive functions and opt-out if needed (optional)
3. Set up periodic cache maintenance (recommended)

✅ **What's Better Now:**

- Automatic cache invalidation on code changes (correctness)
- No per-call overhead (performance)
- Complete invalidation for all args (reliability)
- Simpler mental model (clarity)

✅ **Default is Correct:**

- `auto_versioning=True` prevents stale cache bugs
- Only opt-out for specific expensive stable functions
- When in doubt, use the default