Spaces:
Running on Zero
Running on Zero
File size: 1,942 Bytes
b701455 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | # Prompt Attention Caching
### What It Does
Caches CLIP text embeddings for prompts you've already encoded. When you reuse a prompt (or parts of it), the embedding is retrieved from cache instead of being recomputed.
### When It Helps Most
- Batch generation with same prompt
- Testing different seeds
- Incremental prompt refinement
- Generation sessions with repeated themes
### Configuration
**Enable/Disable** (default: enabled):
```python
from src.Utilities import prompt_cache
# Enable (default)
prompt_cache.enable_prompt_cache(True)
# Disable
prompt_cache.enable_prompt_cache(False)
# Check status
stats = prompt_cache.get_cache_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")
```
**Cache Settings**:
- Maximum entries: 256 prompts before pruning
- Cache structure: global dict keyed by prompt hash and CLIP identity
- Memory usage: workload-dependent, estimated from cached embedding tensors
- Cache cleared on: restart, disable, or manual clear
- Automatic pruning: removes the oldest 25% of entries when the cache exceeds its limit
### Viewing Cache Stats
```python
from src.Utilities import prompt_cache
# Print statistics
prompt_cache.print_cache_stats()
# Output:
# ============================================================
# Prompt Cache Statistics
# ============================================================
# Status: Enabled
# Entries: 42
# Size: ~85.3 MB
# Requests: 150 (hits: 108, misses: 42)
# Hit Rate: 72.0%
# ============================================================
```
### Best Practices
1. **Leave it enabled** - negligible overhead, significant gains
2. **Monitor hit rate** - should be >50% in typical workflows
3. **Clear cache** when switching models or major prompt changes
4. **Batch similar prompts** to maximize cache hits
5. **Expect global behavior** because the cache is shared across repeated prompt encodes rather than being scoped to a single generation session
|