# Quirks & Troubleshooting This playbook highlights the most common operational quirks you may encounter while running LightDiffusion-Next and the quickest ways to resolve them. ## GPU memory headaches | Symptom | Likely cause | Quick fixes | | --- | --- | --- | | `CUDA out of memory` during base diffusion | Resolution or batch too high | Drop to 512×512 or smaller, decrease batch to 1, disable HiresFix or AutoDetailer, prefer Euler/Karras samplers in **CFG++** mode | | OOM triggered mid-way through HiRes | VRAM spikes when loading VAE/second UNet | Enable **Keep models loaded** (to avoid reloading) or run HiRes on CPU by toggling *VAE on CPU* in settings | | Flux runs crash immediately | Missing Flux decoder or running on <16 GB VRAM | Place Flux weights in `include/Flux`, disable Flux or use SD1.5 profile on smaller cards | Additional tips: - Enable **VRAM budget** in Streamlit to see live usage (requires `LD_SHOW_VRAM=1`). - In Docker, pass `--gpus all` and ensure `NVIDIA_VISIBLE_DEVICES` is not empty. - Clear `~/.cache/torch_extensions` if Stable-Fast kernels were compiled against an older driver and now fail to load. ## Slow first runs or repeated recompilation - Stable-Fast and SageAttention compile custom kernels on first use. This can take several minutes. Once complete, the compiled artifacts live under `~/.cache/torch_extensions` (host) or `/root/.cache/torch_extensions` (Docker). Mount this directory as a volume for faster cold starts. - If Streamlit re-compiles every launch, ensure the container or user has write access to the cache directory and that the system clock is correct. - Set `LD_DISABLE_SAGE_ATTENTION=1` to isolate issues related specifically to SageAttention. ## Downloader complaints about missing assets - The startup checks look for standard filenames (e.g., `yolov8n.pt`, `taesdxl_decoder.safetensors`). Verify these live under the correct subdirectories in `include/`. - For offline setups, drop the files manually and create empty `.ok` sentinels (e.g., `include/checkpoints/.downloads-ok`) to skip prompts. - Hugging Face rate limits manifest as HTTP 429. Provide a token via the prompt, set `HF_TOKEN` in the environment or download manually. ## Streamlit UI quirks - **Preview stuck on “Waiting for GPU”** – Check FastAPI logs; the batching worker may be paused. Restart the Streamlit session or run `python server.py` to inspect queue telemetry. - **Settings reset on restart** – Ensure the process can write to `webui_settings.json`. Remove the file to revert to defaults if it becomes corrupted. - **History thumbnails missing** – Delete the entry under `ui/history/`; the next render will recreate previews. ## Gradio or API automation issues - `/api/generate` returns 500 with “No images produced”: inspect server logs for `Pipeline import error` or missing models. Ensure `pipeline.py` is importable and the working directory is the repository root. - Jobs appear stuck: call `/api/telemetry` to inspect `pending_by_signature`. Mixed resolutions or toggles prevent batching; if running single job automation, set `LD_BATCH_WAIT_SINGLETONS=0` to avoid coalescing delays. - SaveImage aborts with "Attempting to save N images in a single call" (exceeds `MAX_IMAGES_PER_SAVE`): this usually indicates tiled intermediate outputs or a very large batched tensor. The server will chunk large coalesced groups into smaller runs of at most `LD_MAX_IMAGES_PER_GROUP` images (default: 256) to mitigate this. If you must allow larger single-call saves, set `LD_MAX_IMAGES_PER_SAVE` to a higher value in the server environment (e.g., `export LD_MAX_IMAGES_PER_SAVE=256`) but be mindful of disk usage. Alternatively, reduce `num_images` per job or lower `LD_MAX_BATCH_SIZE` to keep groups smaller. - Health checks: `/health` returns `{ "status": "ok" }`. If it fails, the FastAPI app likely crashed—restart and inspect `logs/server.log`. ## Docker-specific notes - Always build with the provided `Dockerfile` to get SageAttention patches precompiled. - Forward model assets by mounting `./include` into the container (`-v $(pwd)/include:/app/include`). - On Windows + WSL2, ensure the WSL distro has the NVIDIA driver bridge (`wsl --status`). ## Logging & diagnostics - Server logs live under `logs/server.log` with per-request IDs. Tail them during load testing: `tail -f logs/server.log`. - Enable debug logging by exporting `LD_SERVER_LOGLEVEL=DEBUG` before launching Streamlit/Gradio/uvicorn. - To inspect queue depth without hitting the API, watch the `GenerationBuffer` logs; each batch prints signature summaries. ## When all else fails - Clear the `include/last_seed.txt` file if seed reuse behaves unexpectedly. - Regenerate Stable-Fast kernels by deleting the cache directory and re-running with `stable_fast` enabled. - Collect the following before opening an issue: GPU model, driver version, operating system, a copy of `logs/server.log`, hardware info from `/api/telemetry`, and reproduction steps.