Spaces:
Running on Zero
Running on Zero
| # Quirks & Troubleshooting | |
| This playbook highlights the most common operational quirks you may encounter while running LightDiffusion-Next and the quickest ways to resolve them. | |
| ## GPU memory headaches | |
| | Symptom | Likely cause | Quick fixes | | |
| | --- | --- | --- | | |
| | `CUDA out of memory` during base diffusion | Resolution or batch too high | Drop to 512×512 or smaller, decrease batch to 1, disable HiresFix or AutoDetailer, prefer Euler/Karras samplers in **CFG++** mode | | |
| | OOM triggered mid-way through HiRes | VRAM spikes when loading VAE/second UNet | Enable **Keep models loaded** (to avoid reloading) or run HiRes on CPU by toggling *VAE on CPU* in settings | | |
| | Flux runs crash immediately | Missing Flux decoder or running on <16 GB VRAM | Place Flux weights in `include/Flux`, disable Flux or use SD1.5 profile on smaller cards | | |
| Additional tips: | |
| - Enable **VRAM budget** in Streamlit to see live usage (requires `LD_SHOW_VRAM=1`). | |
| - In Docker, pass `--gpus all` and ensure `NVIDIA_VISIBLE_DEVICES` is not empty. | |
| - Clear `~/.cache/torch_extensions` if Stable-Fast kernels were compiled against an older driver and now fail to load. | |
| ## Slow first runs or repeated recompilation | |
| - Stable-Fast and SageAttention compile custom kernels on first use. This can take several minutes. Once complete, the compiled artifacts live under `~/.cache/torch_extensions` (host) or `/root/.cache/torch_extensions` (Docker). Mount this directory as a volume for faster cold starts. | |
| - If Streamlit re-compiles every launch, ensure the container or user has write access to the cache directory and that the system clock is correct. | |
| - Set `LD_DISABLE_SAGE_ATTENTION=1` to isolate issues related specifically to SageAttention. | |
| ## Downloader complaints about missing assets | |
| - The startup checks look for standard filenames (e.g., `yolov8n.pt`, `taesdxl_decoder.safetensors`). Verify these live under the correct subdirectories in `include/`. | |
| - For offline setups, drop the files manually and create empty `.ok` sentinels (e.g., `include/checkpoints/.downloads-ok`) to skip prompts. | |
| - Hugging Face rate limits manifest as HTTP 429. Provide a token via the prompt, set `HF_TOKEN` in the environment or download manually. | |
| ## Streamlit UI quirks | |
| - **Preview stuck on “Waiting for GPU”** – Check FastAPI logs; the batching worker may be paused. Restart the Streamlit session or run `python server.py` to inspect queue telemetry. | |
| - **Settings reset on restart** – Ensure the process can write to `webui_settings.json`. Remove the file to revert to defaults if it becomes corrupted. | |
| - **History thumbnails missing** – Delete the entry under `ui/history/<timestamp>`; the next render will recreate previews. | |
| ## Gradio or API automation issues | |
| - `/api/generate` returns 500 with “No images produced”: inspect server logs for `Pipeline import error` or missing models. Ensure `pipeline.py` is importable and the working directory is the repository root. | |
| - Jobs appear stuck: call `/api/telemetry` to inspect `pending_by_signature`. Mixed resolutions or toggles prevent batching; if running single job automation, set `LD_BATCH_WAIT_SINGLETONS=0` to avoid coalescing delays. | |
| - SaveImage aborts with "Attempting to save N images in a single call" (exceeds `MAX_IMAGES_PER_SAVE`): this usually indicates tiled intermediate outputs or a very large batched tensor. The server will chunk large coalesced groups into smaller runs of at most `LD_MAX_IMAGES_PER_GROUP` images (default: 256) to mitigate this. If you must allow larger single-call saves, set `LD_MAX_IMAGES_PER_SAVE` to a higher value in the server environment (e.g., `export LD_MAX_IMAGES_PER_SAVE=256`) but be mindful of disk usage. Alternatively, reduce `num_images` per job or lower `LD_MAX_BATCH_SIZE` to keep groups smaller. | |
| - Health checks: `/health` returns `{ "status": "ok" }`. If it fails, the FastAPI app likely crashed—restart and inspect `logs/server.log`. | |
| ## Docker-specific notes | |
| - Always build with the provided `Dockerfile` to get SageAttention patches precompiled. | |
| - Forward model assets by mounting `./include` into the container (`-v $(pwd)/include:/app/include`). | |
| - On Windows + WSL2, ensure the WSL distro has the NVIDIA driver bridge (`wsl --status`). | |
| ## Logging & diagnostics | |
| - Server logs live under `logs/server.log` with per-request IDs. Tail them during load testing: `tail -f logs/server.log`. | |
| - Enable debug logging by exporting `LD_SERVER_LOGLEVEL=DEBUG` before launching Streamlit/Gradio/uvicorn. | |
| - To inspect queue depth without hitting the API, watch the `GenerationBuffer` logs; each batch prints signature summaries. | |
| ## When all else fails | |
| - Clear the `include/last_seed.txt` file if seed reuse behaves unexpectedly. | |
| - Regenerate Stable-Fast kernels by deleting the cache directory and re-running with `stable_fast` enabled. | |
| - Collect the following before opening an issue: GPU model, driver version, operating system, a copy of `logs/server.log`, hardware info from `/api/telemetry`, and reproduction steps. | |