1. Prompt Attention Caching
What It Does
Caches CLIP text embeddings for prompts you've already encoded. When you reuse a prompt (or parts of it), the embedding is retrieved from cache instead of being recomputed.
When It Helps Most
- Batch generation with same prompt
- Testing different seeds
- Incremental prompt refinement
- Generation sessions with repeated themes
Configuration
Enable/Disable (default: enabled):
from src.Utilities import prompt_cache
# Enable (default)
prompt_cache.enable_prompt_cache(True)
# Disable
prompt_cache.enable_prompt_cache(False)
# Check status
stats = prompt_cache.get_cache_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")
Cache Settings: - Maximum entries: 128 prompts - Memory usage: ~50-200MB - Cache cleared on: restart or manual clear - Automatic pruning: removes oldest 25% when full
Viewing Cache Stats
from src.Utilities import prompt_cache
# Print statistics
prompt_cache.print_cache_stats()
# Output:
# ============================================================
# Prompt Cache Statistics
# ============================================================
# Status: Enabled
# Entries: 42
# Size: ~85.3 MB
# Requests: 150 (hits: 108, misses: 42)
# Hit Rate: 72.0%
# ============================================================
Best Practices
- Leave it enabled - negligible overhead, significant gains
- Monitor hit rate - should be >50% in typical workflows
- Clear cache when switching models or major prompt changes
- Batch similar prompts to maximize cache hits