AFML / afml /cache /Cache System Migration Guide.md
akshayboora's picture
Upload 940 files
669d6a1 verified

Cache System Migration Guide

🎯 TL;DR - What Changed

Auto-versioning is now ENABLED BY DEFAULT.

Your cache will automatically invalidate when function code changes. This prevents stale cache bugs.

Most users need to do nothing - just update and enjoy automatic cache invalidation.

Only opt-out if:

  • Function takes hours/days to compute AND
  • Function is stable/won't change AND
  • You understand the risk of stale results

Overview of Changes

  1. auto_versioning=True by default: Cache keys include function source hash
  2. One decorator to rule them all: @cacheable() replaces multiple decorators
  3. Removed smart_cacheable: Now redundant (built into default behavior)
  4. Selective cleaner refocused: Maintenance tool for orphaned caches

Quick Migration Table

Old Code New Code Notes
@robust_cacheable @cacheable() Now has auto-versioning by default
@time_aware_cacheable @cacheable(time_aware=True) Now has auto-versioning by default
@cv_cacheable @cacheable() Now has auto-versioning by default
@smart_cacheable @cacheable() REMOVED - now default behavior
@cacheable() (old) @cacheable(auto_versioning=False) Only if you need old behavior

What is Auto-Versioning?

The Problem It Solves

# Without auto-versioning
@cacheable(auto_versioning=False)
def calculate_returns(prices):
    return prices.pct_change()

calculate_returns(df)  # Cache miss, stores result
calculate_returns(df)  # Cache hit ✓

# Developer fixes a bug
def calculate_returns(prices):
    return prices.pct_change().fillna(0)  # Bug fix!

calculate_returns(df)  # Cache HIT - WRONG RESULT! ❌
# Returns OLD buggy result from cache

With Auto-Versioning (Now Default)

# With auto-versioning (NEW DEFAULT)
@cacheable()  # auto_versioning=True by default
def calculate_returns(prices):
    return prices.pct_change()

calculate_returns(df)  # Cache miss, stores at key "v_abc123..."
calculate_returns(df)  # Cache hit ✓

# Developer fixes bug
def calculate_returns(prices):
    return prices.pct_change().fillna(0)  # Bug fix!

calculate_returns(df)  # Cache MISS - new key "v_def456..." ✓
# Computes with NEW correct code

Migration Steps

Step 1: Update smart_cacheable (REQUIRED)

Old code:

from afml.cache import smart_cacheable

@smart_cacheable
def my_function(data):
    return data.mean()

New code:

from afml.cache import cacheable

@cacheable()  # That's it! auto_versioning is now default
def my_function(data):
    return data.mean()

Step 2: Review Expensive Functions (OPTIONAL)

If you have functions that take hours to compute and rarely change:

@cacheable(auto_versioning=False)  # Explicit opt-out
def train_huge_model(data):
    """Takes 48 hours, changes once per year"""
    return expensive_training(data)

⚠️ Warning: With auto_versioning=False, adding a comment invalidates cache:

@cacheable(auto_versioning=False)
def train_huge_model(data):
    """Added this docstring"""  # THIS CHANGE WON'T INVALIDATE CACHE
    return expensive_training(data)  # May return stale result!

Step 3: Clean Up Old Caches (RECOMMENDED)

After migration, clean up orphaned caches:

from afml.cache import cache_maintenance

# One-time cleanup after migration
cache_maintenance(
    clean_orphaned=True,
    max_cache_size_mb=1000,
    max_age_days=30
)

Understanding Auto-Versioning Behavior

How Cache Keys Work

Without auto-versioning:

cache_key = md5("module.function_name" + "arg_hashes")
           = "a1b2c3d4..."

With auto-versioning (default):

cache_key = md5("module.function_name" + "v_abc123" + "arg_hashes")
                                         ^^^^^^^^^^
                                    function source hash
           = "e5f6g7h8..."  # Different key!

When Cache Invalidates

Cache invalidates when:

  • ✅ Function body changes
  • ✅ Function name changes
  • ✅ Default parameters change
  • ✅ Decorators change
  • ❌ Comments change (graceful: uses file mtime as fallback)
  • ❌ Docstrings change (graceful: uses file mtime as fallback)

Graceful Fallback

For built-in/dynamic functions where source is unavailable:

# Can't get source for built-ins
import numpy as np

@cacheable()  # Gracefully falls back to file mtime
def use_builtin(data):
    return np.mean(data)  # np.mean has no source

# Warning logged, but doesn't crash

Common Scenarios

Scenario 1: Development (Default - No Changes Needed)

from afml.cache import cacheable

@cacheable()  # Just use defaults!
def my_feature(data, window):
    """Feature under active development"""
    return data.rolling(window).mean()

# Work normally - cache auto-invalidates on changes
result1 = my_feature(df, 20)
result2 = my_feature(df, 20)  # Cache hit

# ... modify my_feature ...

result3 = my_feature(df, 20)  # Cache miss (automatic!)

Scenario 2: Expensive Computation (Explicit Opt-Out)

from afml.cache import cacheable

@cacheable(auto_versioning=False)  # Explicit opt-out
def train_production_model(data):
    """Takes 24 hours, changes rarely, want to preserve cache"""
    return expensive_training(data)

Scenario 3: Bulk Opt-Out for Stable Functions

from afml.cache import disable_auto_versioning

# Create custom decorator without versioning
cacheable_stable = disable_auto_versioning()

@cacheable_stable()
def stable_func_1(data): ...

@cacheable_stable()
def stable_func_2(data): ...

@cacheable_stable(time_aware=True)  # Can combine with other options
def stable_func_3(data): ...

Scenario 4: Mixed Strategy

from afml.cache import cacheable

# Under development - auto-versioning
@cacheable()
def experimental_feature(data):
    return data.ewm(span=20).mean()

# Production stable - opt-out
@cacheable(auto_versioning=False)
def load_data(symbol, start, end):
    return expensive_data_load(symbol, start, end)

Maintenance & Cleanup

Periodic Cleanup (Recommended)

Set up weekly/monthly cleanup:

from afml.cache import cache_maintenance

# Run weekly via cron/scheduler
cache_maintenance(
    clean_orphaned=True,      # Remove old function versions
    max_cache_size_mb=2000,   # Enforce size limit
    max_age_days=90,          # Remove very old caches
    min_orphan_age_hours=48   # Keep recent orphans (grace period)
)

Analyze Cache Fragmentation

Check if auto-versioning is creating too many versions:

from afml.cache import print_version_analysis

print_version_analysis()
# Output:
# ========================================
# CACHE VERSION ANALYSIS
# ========================================
# Functions with versions: 12
# Total versions: 34
# Total size: 1.2 GB
# 
# Top fragmented functions:
#   1. calculate_feature
#      Versions: 8
#      Size: 450 MB

If fragmentation is high, consider opting out for those functions.


Performance Implications

Overhead of Auto-Versioning

Minimal overhead - hash computed once at decorator application:

# Old smart_cacheable: 0.5ms PER CALL
@smart_cacheable  # Read source + hash on EVERY call
def fast_func(x):
    return x + 1

# New auto_versioning: 0ms per call
@cacheable()  # Hash computed ONCE at import time
def fast_func(x):
    return x + 1

Storage Implications

With auto-versioning, multiple versions can coexist temporarily:

cache/
  my_module/
    my_function/
      v_abc123_args_xyz/  # Version 1 (orphaned)
      v_def456_args_xyz/  # Version 2 (current)
      v_ghi789_args_xyz/  # Version 3 (current)

Mitigation: Run cache_maintenance() periodically to clean orphans.


Testing Your Migration

1. Check for smart_cacheable usage

# This should find zero results after migration
grep -r "smart_cacheable" your_project/

2. Test auto-versioning behavior

from afml.cache import cacheable

@cacheable()
def test_func(x):
    return x * 2

# First call
result1 = test_func(5)  # Cache miss

# Second call (should hit)
result2 = test_func(5)  # Cache hit

# Change function
def test_func(x):
    return x * 3  # Changed!

# Third call (should miss due to version change)
result3 = test_func(5)  # Cache miss (automatic!)

assert result3 == 15  # New result

3. Verify cleanup works

from afml.cache import find_orphaned_caches

orphans = find_orphaned_caches()
print(f"Found {orphans['orphaned_count']} orphaned caches")
print(f"Total size: {orphans['total_size_mb']} MB")

Troubleshooting

Issue: Cache not invalidating on changes

Cause: Function source unavailable (built-in/dynamic)

Solution: Check logs for warnings:

# Look for:
# "Cannot hash source for my_func, using file mtime for versioning"

If file mtime also fails, explicitly use auto_versioning=False and manage manually.

Issue: Too many cache versions

Cause: Rapid development with many changes

Solution: Run cleanup more frequently:

from afml.cache import cache_maintenance

cache_maintenance(
    clean_orphaned=True,
    min_orphan_age_hours=12  # More aggressive
)

Issue: Expensive function cache lost

Cause: Auto-versioning invalidated cache on minor change

Solution: Opt-out for that specific function:

@cacheable(auto_versioning=False)
def expensive_stable_function(data):
    return days_of_computation(data)

Backward Compatibility

Old Decorator Aliases

These still work (no changes needed):

from afml.cache import (
    robust_cacheable,      # = cacheable()
    time_aware_cacheable,  # = cacheable(time_aware=True)
    cv_cacheable,          # = cacheable()
)

# All now have auto_versioning=True by default

Disabling Auto-Versioning Globally

If you want old behavior everywhere (not recommended):

# In your __init__.py or main module
from afml.cache import disable_auto_versioning

# Use this instead of cacheable
cacheable = disable_auto_versioning()

# Now all @cacheable() calls have auto_versioning=False

Getting Help

Check Cache Health

from afml.cache import print_cache_report
print_cache_report()

Debug Specific Function

from afml.cache import debug_function_cache
debug_function_cache("afml.features.my_func")

Analyze Version Fragmentation

from afml.cache import analyze_cache_versions, print_version_analysis

analysis = analyze_cache_versions()
print_version_analysis()

Summary

What You Need to Do:

  1. Replace @smart_cacheable with @cacheable() (required)
  2. Review expensive functions and opt-out if needed (optional)
  3. Set up periodic cache maintenance (recommended)

What's Better Now:

  • Automatic cache invalidation on code changes (correctness)
  • No per-call overhead (performance)
  • Complete invalidation for all args (reliability)
  • Simpler mental model (clarity)

Default is Correct:

  • auto_versioning=True prevents stale cache bugs
  • Only opt-out for specific expensive stable functions
  • When in doubt, use the default