LightDiffusion-Next can run locally on Windows or Linux, inside Docker, or on cloud GPUs. This page walks you through the supported installation paths and the assets you must download before your first generation.
Hardware & software requirements
The project is tuned for NVIDIA GPUs and CUDA 12.x drivers, but it also supports AMD GPUs with ROCm and Apple Silicon with Metal Performance Shaders (MPS). See ROCm and Metal/MPS Support for platform-specific installation instructions.
- Operating system: Windows 10/11, Ubuntu 22.04+, macOS 12.3+ (for Apple Silicon), or any distro supported by NVIDIA Container Toolkit.
- Python: 3.10.x. The run scripts create a virtual environment automatically.
- GPU:
- NVIDIA: Card with at least compute capability 8.0 (Ampere) for SageAttention/SpargeAttn. RTX 50 series (compute 12.0) runs with SageAttention + Stable-Fast.
- AMD: RDNA 2+ or CDNA architectures with ROCm 5.0+. See ROCm Support.
- Apple Silicon: M1/M2/M3 series with macOS 12.3+. See Metal/MPS Support.
- VRAM: 6 GB minimum (12 GB recommended) for SD1.5 workflows. Flux quantized pipelines require 16 GB+ for comfortable batching.
- Disk space: ~15 GB for dependencies plus your checkpoints, LoRAs and flux assets.
Choose an installation path
Windows quick start (run.bat)
The root repository ships with a convenience script that handles environment creation, dependency installation via uv, GPU detection and launching the Streamlit UI.
- Install the latest Python 3.10 build and ensure
pythonis on yourPATH. - Install the NVIDIA CUDA 12 runtime driver that matches your GPU.
- Clone the repository and place your checkpoints in
include/checkpoints(see Model assets). -
Double-click
run.batfrom a terminal. The script will:- Create
.venv(if it does not exist) and upgradepip. - Install
uvfor fast dependency resolution. - Detect an NVIDIA GPU via
nvidia-smiand install the matching PyTorch wheels. - Install all requirements and start Streamlit at
http://localhost:8501.
- Create
-
When you are done, close the terminal to stop the UI. The virtual environment is reusable across runs.
Tip: To launch the Gradio UI instead, activate
.venvand runpython app.py.
Linux/WSL2 manual setup
-
Install system dependencies:
bash sudo apt update && sudo apt install python3.10 python3.10-venv python3-pip build-essential git -
(Optional) Install the NVIDIA CUDA 12 toolkit so SageAttention/SpargeAttn can compile native extensions.
-
Create and activate a virtual environment:
bash python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip uv -
Install PyTorch and core dependencies:
bash uv pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision "triton>=2.1.0" uv pip install -r requirements.txt -
Launch the Streamlit UI:
bash streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=8501Use
python app.pyif you prefer the Gradio interface. -
Deactivate the environment with
deactivatewhen finished.
Docker and containers
Use Docker when you want an immutable runtime with SageAttention, SpargeAttn and Stable-Fast prebuilt.
- Install Docker Desktop or Docker Engine with the NVIDIA Container Toolkit.
-
Clone the repository and review
docker-compose.yml. Adjust:TORCH_CUDA_ARCH_LISTif you only target a specific GPU architecture.INSTALL_STABLE_FASTandINSTALL_OLLAMAbuild arguments if you want Stable-Fast or the Ollama prompt enhancer baked into the image.- Volume mounts for
output/and theinclude/*directories where you store checkpoints, LoRAs, embeddings and YOLO detectors.
-
Build and start the stack:
bash docker-compose up --buildStreamlit is exposed on
http://localhost:8501by default; Gradio is mapped to port7860and can be enabled by settingUI_FRAMEWORK=gradio. -
To rebuild with a different GPU architecture or optional component:
bash docker-compose build --build-arg TORCH_CUDA_ARCH_LIST="9.0" --build-arg INSTALL_STABLE_FAST=1
Running only the FastAPI server
If you want to integrate LightDiffusion-Next into automation pipelines or Discord bots, run the backend without launching a UI.
- Follow any of the setup methods above.
-
Run:
bash uvicorn server:app --host 0.0.0.0 --port 7861 -
Use the REST API reference to submit generation jobs via
POST /api/generateand inspect queue health viaGET /api/telemetry.
Model assets
LightDiffusion-Next does not bundle model weights. Place your assets into the include/ tree before you start generating.
include/checkpoints/— SD1.5 style.safetensorscheckpoints (e.g. Meina V10, DreamShaper). The default pipeline expects a file namedMeina V10 - baked VAE.safetensorsunless you override it.include/vae/ae.safetensors— Flux VAE (download from black-forest-labs/FLUX.1-schnell). Required for Flux mode.include/loras/— LoRA adapters loaded from the UI or CLI.include/embeddings/— Negative prompt embeddings such asEasyNegative,badhandv4.include/yolos/— YOLO detectors used by ADetailer (person_yolov8m-seg.pt,face_yolov9c.pt).include/ESRGAN/— RealESRGAN models leveraged by UltimateSDUpscale in Img2Img workflows.include/sd1_tokenizer/— Tokenizer files for SD1.x. The repository already includes the defaults.
Store generated outputs under output/ (separated into Classic, Flux, Img2Img, HiresFix and ADetailer sub-folders). The folders are created automatically during the first run.
Optional accelerations
- Stable-Fast — 70% faster SD1.5 inference through UNet compilation. Set
INSTALL_STABLE_FAST=1in Docker or pass--stable-fastin the CLI/UI to compile on demand. Compilation adds a one-time warm-up cost. - SageAttention — INT8 attention kernels with 15% speedup and lower VRAM use. Built automatically in Docker images; on bare metal, clone SageAttention and run
pip install -e . --no-build-isolationinside your environment. - SpargeAttn — Sparse attention kernels with 40–60% speedup (compute 8.0–9.0 GPUs only). Build from SpargeAttn using
TORCH_CUDA_ARCH_LIST="8.9"or similar. - Ollama prompt enhancer — Install Ollama and pull
qwen3:0.6b. SetPROMPT_ENHANCER_MODEL=qwen3:0.6bbefore launching LightDiffusion-Next to enable the automatic prompt rewrite toggle.
Verify your installation
- Start the UI or FastAPI server.
- Watch the startup logs — the initialization progress bar runs the dependency download routine (
CheckAndDownload) and loads the default checkpoint. - Generate a 512×512 image with the default prompt. The status bar shows timing and the output appears in
output/Classic. -
Confirm the telemetry endpoint is reachable:
bash curl http://localhost:7861/health curl http://localhost:7861/api/telemetry
Updating or rebuilding
- Pull the latest Git changes and rerun
uv pip install -r requirements.txtin the virtual environment. - For Docker users, rebuild with
docker-compose build --no-cacheto pick up updates. - If you upgraded your GPU driver or CUDA toolkit, delete
~/.cache/torch_extensionsto force SageAttention/SpargeAttn to recompile.
You are now ready to explore the UI guide and start generating.