Spaces:

akshayboora
/

AFML

No application file

App Files Files Community

AFML / afml /cache /Cache System Migration Guide.md

akshayboora

Upload 940 files

669d6a1 verified 13 days ago

preview code

raw

history blame contribute delete

11.4 kB

	# Cache System Migration Guide

	## 🎯 TL;DR - What Changed

	Auto-versioning is now ENABLED BY DEFAULT.

	Your cache will automatically invalidate when function code changes. This prevents stale cache bugs.

	Most users need to do nothing - just update and enjoy automatic cache invalidation.

	Only opt-out if:

	- Function takes hours/days to compute AND
	- Function is stable/won't change AND
	- You understand the risk of stale results

	---

	## Overview of Changes

	1. `auto_versioning=True` by default: Cache keys include function source hash
	2. One decorator to rule them all: `@cacheable()` replaces multiple decorators
	3. Removed `smart_cacheable`: Now redundant (built into default behavior)
	4. Selective cleaner refocused: Maintenance tool for orphaned caches

	---

	## Quick Migration Table

	\| Old Code \| New Code \| Notes \|
	\|----------\|----------\|-------\|
	\| `@robust_cacheable` \| `@cacheable()` \| Now has auto-versioning by default \|
	\| `@time_aware_cacheable` \| `@cacheable(time_aware=True)` \| Now has auto-versioning by default \|
	\| `@cv_cacheable` \| `@cacheable()` \| Now has auto-versioning by default \|
	\| `@smart_cacheable` \| `@cacheable()` \| REMOVED - now default behavior \|
	\| `@cacheable()` (old) \| `@cacheable(auto_versioning=False)` \| Only if you need old behavior \|

	---

	## What is Auto-Versioning?

	### The Problem It Solves

	```python
	# Without auto-versioning
	@cacheable(auto_versioning=False)
	def calculate_returns(prices):
	return prices.pct_change()

	calculate_returns(df) # Cache miss, stores result
	calculate_returns(df) # Cache hit ✓

	# Developer fixes a bug
	def calculate_returns(prices):
	return prices.pct_change().fillna(0) # Bug fix!

	calculate_returns(df) # Cache HIT - WRONG RESULT! ❌
	# Returns OLD buggy result from cache
	```

	### With Auto-Versioning (Now Default)

	```python
	# With auto-versioning (NEW DEFAULT)
	@cacheable() # auto_versioning=True by default
	def calculate_returns(prices):
	return prices.pct_change()

	calculate_returns(df) # Cache miss, stores at key "v_abc123..."
	calculate_returns(df) # Cache hit ✓

	# Developer fixes bug
	def calculate_returns(prices):
	return prices.pct_change().fillna(0) # Bug fix!

	calculate_returns(df) # Cache MISS - new key "v_def456..." ✓
	# Computes with NEW correct code
	```

	---

	## Migration Steps

	### Step 1: Update `smart_cacheable` (REQUIRED)

	Old code:

	```python
	from afml.cache import smart_cacheable

	@smart_cacheable
	def my_function(data):
	return data.mean()
	```

	New code:

	```python
	from afml.cache import cacheable

	@cacheable() # That's it! auto_versioning is now default
	def my_function(data):
	return data.mean()
	```

	### Step 2: Review Expensive Functions (OPTIONAL)

	If you have functions that take hours to compute and rarely change:

	```python
	@cacheable(auto_versioning=False) # Explicit opt-out
	def train_huge_model(data):
	"""Takes 48 hours, changes once per year"""
	return expensive_training(data)
	```

	⚠️ Warning: With `auto_versioning=False`, adding a comment invalidates cache:

	```python
	@cacheable(auto_versioning=False)
	def train_huge_model(data):
	"""Added this docstring""" # THIS CHANGE WON'T INVALIDATE CACHE
	return expensive_training(data) # May return stale result!
	```

	### Step 3: Clean Up Old Caches (RECOMMENDED)

	After migration, clean up orphaned caches:

	```python
	from afml.cache import cache_maintenance

	# One-time cleanup after migration
	cache_maintenance(
	clean_orphaned=True,
	max_cache_size_mb=1000,
	max_age_days=30
	)
	```

	---

	## Understanding Auto-Versioning Behavior

	### How Cache Keys Work

	Without auto-versioning:

	```
	cache_key = md5("module.function_name" + "arg_hashes")
	= "a1b2c3d4..."
	```

	With auto-versioning (default):

	```
	cache_key = md5("module.function_name" + "v_abc123" + "arg_hashes")
	^^^^^^^^^^
	function source hash
	= "e5f6g7h8..." # Different key!
	```

	### When Cache Invalidates

	Cache invalidates when:

	- ✅ Function body changes
	- ✅ Function name changes
	- ✅ Default parameters change
	- ✅ Decorators change
	- ❌ Comments change (graceful: uses file mtime as fallback)
	- ❌ Docstrings change (graceful: uses file mtime as fallback)

	### Graceful Fallback

	For built-in/dynamic functions where source is unavailable:

	```python
	# Can't get source for built-ins
	import numpy as np

	@cacheable() # Gracefully falls back to file mtime
	def use_builtin(data):
	return np.mean(data) # np.mean has no source

	# Warning logged, but doesn't crash
	```

	---

	## Common Scenarios

	### Scenario 1: Development (Default - No Changes Needed)

	```python
	from afml.cache import cacheable

	@cacheable() # Just use defaults!
	def my_feature(data, window):
	"""Feature under active development"""
	return data.rolling(window).mean()

	# Work normally - cache auto-invalidates on changes
	result1 = my_feature(df, 20)
	result2 = my_feature(df, 20) # Cache hit

	# ... modify my_feature ...

	result3 = my_feature(df, 20) # Cache miss (automatic!)
	```

	### Scenario 2: Expensive Computation (Explicit Opt-Out)

	```python
	from afml.cache import cacheable

	@cacheable(auto_versioning=False) # Explicit opt-out
	def train_production_model(data):
	"""Takes 24 hours, changes rarely, want to preserve cache"""
	return expensive_training(data)
	```

	### Scenario 3: Bulk Opt-Out for Stable Functions

	```python
	from afml.cache import disable_auto_versioning

	# Create custom decorator without versioning
	cacheable_stable = disable_auto_versioning()

	@cacheable_stable()
	def stable_func_1(data): ...

	@cacheable_stable()
	def stable_func_2(data): ...

	@cacheable_stable(time_aware=True) # Can combine with other options
	def stable_func_3(data): ...
	```

	### Scenario 4: Mixed Strategy

	```python
	from afml.cache import cacheable

	# Under development - auto-versioning
	@cacheable()
	def experimental_feature(data):
	return data.ewm(span=20).mean()

	# Production stable - opt-out
	@cacheable(auto_versioning=False)
	def load_data(symbol, start, end):
	return expensive_data_load(symbol, start, end)
	```

	---

	## Maintenance & Cleanup

	### Periodic Cleanup (Recommended)

	Set up weekly/monthly cleanup:

	```python
	from afml.cache import cache_maintenance

	# Run weekly via cron/scheduler
	cache_maintenance(
	clean_orphaned=True, # Remove old function versions
	max_cache_size_mb=2000, # Enforce size limit
	max_age_days=90, # Remove very old caches
	min_orphan_age_hours=48 # Keep recent orphans (grace period)
	)
	```

	### Analyze Cache Fragmentation

	Check if auto-versioning is creating too many versions:

	```python
	from afml.cache import print_version_analysis

	print_version_analysis()
	# Output:
	# ========================================
	# CACHE VERSION ANALYSIS
	# ========================================
	# Functions with versions: 12
	# Total versions: 34
	# Total size: 1.2 GB
	#
	# Top fragmented functions:
	# 1. calculate_feature
	# Versions: 8
	# Size: 450 MB
	```

	If fragmentation is high, consider opting out for those functions.

	---

	## Performance Implications

	### Overhead of Auto-Versioning

	Minimal overhead - hash computed once at decorator application:

	```python
	# Old smart_cacheable: 0.5ms PER CALL
	@smart_cacheable # Read source + hash on EVERY call
	def fast_func(x):
	return x + 1

	# New auto_versioning: 0ms per call
	@cacheable() # Hash computed ONCE at import time
	def fast_func(x):
	return x + 1
	```

	### Storage Implications

	With auto-versioning, multiple versions can coexist temporarily:

	```bash
	cache/
	my_module/
	my_function/
	v_abc123_args_xyz/ # Version 1 (orphaned)
	v_def456_args_xyz/ # Version 2 (current)
	v_ghi789_args_xyz/ # Version 3 (current)
	```

	Mitigation: Run `cache_maintenance()` periodically to clean orphans.

	---

	## Testing Your Migration

	### 1. Check for `smart_cacheable` usage

	```bash
	# This should find zero results after migration
	grep -r "smart_cacheable" your_project/
	```

	### 2. Test auto-versioning behavior

	```python
	from afml.cache import cacheable

	@cacheable()
	def test_func(x):
	return x * 2

	# First call
	result1 = test_func(5) # Cache miss

	# Second call (should hit)
	result2 = test_func(5) # Cache hit

	# Change function
	def test_func(x):
	return x * 3 # Changed!

	# Third call (should miss due to version change)
	result3 = test_func(5) # Cache miss (automatic!)

	assert result3 == 15 # New result
	```

	### 3. Verify cleanup works

	```python
	from afml.cache import find_orphaned_caches

	orphans = find_orphaned_caches()
	print(f"Found {orphans['orphaned_count']} orphaned caches")
	print(f"Total size: {orphans['total_size_mb']} MB")
	```

	---

	## Troubleshooting

	### Issue: Cache not invalidating on changes

	Cause: Function source unavailable (built-in/dynamic)

	Solution: Check logs for warnings:

	```python
	# Look for:
	# "Cannot hash source for my_func, using file mtime for versioning"
	```

	If file mtime also fails, explicitly use `auto_versioning=False` and manage manually.

	### Issue: Too many cache versions

	Cause: Rapid development with many changes

	Solution: Run cleanup more frequently:

	```python
	from afml.cache import cache_maintenance

	cache_maintenance(
	clean_orphaned=True,
	min_orphan_age_hours=12 # More aggressive
	)
	```

	### Issue: Expensive function cache lost

	Cause: Auto-versioning invalidated cache on minor change

	Solution: Opt-out for that specific function:

	```python
	@cacheable(auto_versioning=False)
	def expensive_stable_function(data):
	return days_of_computation(data)
	```

	---

	## Backward Compatibility

	### Old Decorator Aliases

	These still work (no changes needed):

	```python
	from afml.cache import (
	robust_cacheable, # = cacheable()
	time_aware_cacheable, # = cacheable(time_aware=True)
	cv_cacheable, # = cacheable()
	)

	# All now have auto_versioning=True by default
	```

	### Disabling Auto-Versioning Globally

	If you want old behavior everywhere (not recommended):

	```python
	# In your __init__.py or main module
	from afml.cache import disable_auto_versioning

	# Use this instead of cacheable
	cacheable = disable_auto_versioning()

	# Now all @cacheable() calls have auto_versioning=False
	```

	---

	## Getting Help

	### Check Cache Health

	```python
	from afml.cache import print_cache_report
	print_cache_report()
	```

	### Debug Specific Function

	```python
	from afml.cache import debug_function_cache
	debug_function_cache("afml.features.my_func")
	```

	### Analyze Version Fragmentation

	```python
	from afml.cache import analyze_cache_versions, print_version_analysis

	analysis = analyze_cache_versions()
	print_version_analysis()
	```

	---

	## Summary

	✅ What You Need to Do:

	1. Replace `@smart_cacheable` with `@cacheable()` (required)
	2. Review expensive functions and opt-out if needed (optional)
	3. Set up periodic cache maintenance (recommended)

	✅ What's Better Now:

	- Automatic cache invalidation on code changes (correctness)
	- No per-call overhead (performance)
	- Complete invalidation for all args (reliability)
	- Simpler mental model (clarity)

	✅ Default is Correct:

	- `auto_versioning=True` prevents stale cache bugs
	- Only opt-out for specific expensive stable functions
	- When in doubt, use the default