Prompt Caching

1. Prompt Attention Caching

What It Does

Caches CLIP text embeddings for prompts you've already encoded. When you reuse a prompt (or parts of it), the embedding is retrieved from cache instead of being recomputed.

When It Helps Most

Batch generation with same prompt
Testing different seeds
Incremental prompt refinement
Generation sessions with repeated themes

Configuration

Enable/Disable (default: enabled):

from src.Utilities import prompt_cache

# Enable (default)
prompt_cache.enable_prompt_cache(True)

# Disable
prompt_cache.enable_prompt_cache(False)

# Check status
stats = prompt_cache.get_cache_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")

Cache Settings: - Maximum entries: 128 prompts - Memory usage: ~50-200MB - Cache cleared on: restart or manual clear - Automatic pruning: removes oldest 25% when full

Viewing Cache Stats

from src.Utilities import prompt_cache

# Print statistics
prompt_cache.print_cache_stats()

# Output:
# ============================================================
# Prompt Cache Statistics
# ============================================================
#   Status: Enabled
#   Entries: 42
#   Size: ~85.3 MB
#   Requests: 150 (hits: 108, misses: 42)
#   Hit Rate: 72.0%
# ============================================================

Best Practices

Leave it enabled - negligible overhead, significant gains
Monitor hit rate - should be >50% in typical workflows
Clear cache when switching models or major prompt changes
Batch similar prompts to maximize cache hits