Performance & Troubleshooting

This playbook highlights the most common operational quirks you may encounter while running LightDiffusion-Next and the quickest ways to resolve them.

GPU memory headaches

Symptom	Likely cause	Quick fixes
`CUDA out of memory` during base diffusion	Resolution or batch too high	Drop to 512×512 or smaller, decrease batch to 1, disable HiresFix or AutoDetailer, prefer Euler/Karras samplers in CFG++ mode
OOM triggered mid-way through HiRes	VRAM spikes when loading VAE/second UNet	Enable Keep models loaded (to avoid reloading) or run HiRes on CPU by toggling VAE on CPU in settings
Flux runs crash immediately	Missing Flux decoder or running on <16 GB VRAM	Place Flux weights in `include/Flux`, disable Flux or use SD1.5 profile on smaller cards

Additional tips:

Enable VRAM budget in Streamlit to see live usage (requires LD_SHOW_VRAM=1).
In Docker, pass --gpus all and ensure NVIDIA_VISIBLE_DEVICES is not empty.
Clear ~/.cache/torch_extensions if Stable-Fast kernels were compiled against an older driver and now fail to load.

Slow first runs or repeated recompilation

Stable-Fast and SageAttention compile custom kernels on first use. This can take several minutes. Once complete, the compiled artifacts live under ~/.cache/torch_extensions (host) or /root/.cache/torch_extensions (Docker). Mount this directory as a volume for faster cold starts.
If Streamlit re-compiles every launch, ensure the container or user has write access to the cache directory and that the system clock is correct.
Set LD_DISABLE_SAGE_ATTENTION=1 to isolate issues related specifically to SageAttention.

Downloader complaints about missing assets

The startup checks look for standard filenames (e.g., yolov8n.pt, taesdxl_decoder.safetensors). Verify these live under the correct subdirectories in include/.
For offline setups, drop the files manually and create empty .ok sentinels (e.g., include/checkpoints/.downloads-ok) to skip prompts.
Hugging Face rate limits manifest as HTTP 429. Provide a token via the prompt, set HF_TOKEN in the environment or download manually.

Streamlit UI quirks

Preview stuck on “Waiting for GPU” – Check FastAPI logs; the batching worker may be paused. Restart the Streamlit session or run python server.py to inspect queue telemetry.
Settings reset on restart – Ensure the process can write to webui_settings.json. Remove the file to revert to defaults if it becomes corrupted.
History thumbnails missing – Delete the entry under ui/history/<timestamp>; the next render will recreate previews.

Gradio or API automation issues

/api/generate returns 500 with “No images produced”: inspect server logs for Pipeline import error or missing models. Ensure pipeline.py is importable and the working directory is the repository root.
Jobs appear stuck: call /api/telemetry to inspect pending_by_signature. Mixed resolutions or toggles prevent batching; if running single job automation, set LD_BATCH_WAIT_SINGLETONS=0 to avoid coalescing delays.
Health checks: /health returns { "status": "ok" }. If it fails, the FastAPI app likely crashed—restart and inspect logs/server.log.

Docker-specific notes

Always build with the provided Dockerfile to get SageAttention patches precompiled.
Forward model assets by mounting ./include into the container (-v $(pwd)/include:/app/include).
On Windows + WSL2, ensure the WSL distro has the NVIDIA driver bridge (wsl --status).

Logging & diagnostics

Server logs live under logs/server.log with per-request IDs. Tail them during load testing: tail -f logs/server.log.
Enable debug logging by exporting LD_SERVER_LOGLEVEL=DEBUG before launching Streamlit/Gradio/uvicorn.
To inspect queue depth without hitting the API, watch the GenerationBuffer logs; each batch prints signature summaries.

When all else fails

Clear the include/last_seed.txt file if seed reuse behaves unexpectedly.
Regenerate Stable-Fast kernels by deleting the cache directory and re-running with stable_fast enabled.
Collect the following before opening an issue: GPU model, driver version, operating system, a copy of logs/server.log, hardware info from /api/telemetry, and reproduction steps.