Spaces:

Aatricks
/

LightDiffusion-Next

Running on Zero

App Files Files Community

LightDiffusion-Next / docs /quirks.md

Aatricks

Deploy ZeroGPU Gradio Space snapshot

b701455 30 days ago

preview code

raw

history blame contribute delete

5.01 kB

	# Quirks & Troubleshooting

	This playbook highlights the most common operational quirks you may encounter while running LightDiffusion-Next and the quickest ways to resolve them.

	## GPU memory headaches

	\| Symptom \| Likely cause \| Quick fixes \|
	\| --- \| --- \| --- \|
	\| `CUDA out of memory` during base diffusion \| Resolution or batch too high \| Drop to 512×512 or smaller, decrease batch to 1, disable HiresFix or AutoDetailer, prefer Euler/Karras samplers in CFG++ mode \|
	\| OOM triggered mid-way through HiRes \| VRAM spikes when loading VAE/second UNet \| Enable Keep models loaded (to avoid reloading) or run HiRes on CPU by toggling VAE on CPU in settings \|
	\| Flux runs crash immediately \| Missing Flux decoder or running on <16 GB VRAM \| Place Flux weights in `include/Flux`, disable Flux or use SD1.5 profile on smaller cards \|

	Additional tips:

	- Enable VRAM budget in Streamlit to see live usage (requires `LD_SHOW_VRAM=1`).
	- In Docker, pass `--gpus all` and ensure `NVIDIA_VISIBLE_DEVICES` is not empty.
	- Clear `~/.cache/torch_extensions` if Stable-Fast kernels were compiled against an older driver and now fail to load.

	## Slow first runs or repeated recompilation

	- Stable-Fast and SageAttention compile custom kernels on first use. This can take several minutes. Once complete, the compiled artifacts live under `~/.cache/torch_extensions` (host) or `/root/.cache/torch_extensions` (Docker). Mount this directory as a volume for faster cold starts.
	- If Streamlit re-compiles every launch, ensure the container or user has write access to the cache directory and that the system clock is correct.
	- Set `LD_DISABLE_SAGE_ATTENTION=1` to isolate issues related specifically to SageAttention.

	## Downloader complaints about missing assets

	- The startup checks look for standard filenames (e.g., `yolov8n.pt`, `taesdxl_decoder.safetensors`). Verify these live under the correct subdirectories in `include/`.
	- For offline setups, drop the files manually and create empty `.ok` sentinels (e.g., `include/checkpoints/.downloads-ok`) to skip prompts.
	- Hugging Face rate limits manifest as HTTP 429. Provide a token via the prompt, set `HF_TOKEN` in the environment or download manually.

	## Streamlit UI quirks

	- Preview stuck on “Waiting for GPU” – Check FastAPI logs; the batching worker may be paused. Restart the Streamlit session or run `python server.py` to inspect queue telemetry.
	- Settings reset on restart – Ensure the process can write to `webui_settings.json`. Remove the file to revert to defaults if it becomes corrupted.
	- History thumbnails missing – Delete the entry under `ui/history/<timestamp>`; the next render will recreate previews.

	## Gradio or API automation issues

	- `/api/generate` returns 500 with “No images produced”: inspect server logs for `Pipeline import error` or missing models. Ensure `pipeline.py` is importable and the working directory is the repository root.
	- Jobs appear stuck: call `/api/telemetry` to inspect `pending_by_signature`. Mixed resolutions or toggles prevent batching; if running single job automation, set `LD_BATCH_WAIT_SINGLETONS=0` to avoid coalescing delays.
	- SaveImage aborts with "Attempting to save N images in a single call" (exceeds `MAX_IMAGES_PER_SAVE`): this usually indicates tiled intermediate outputs or a very large batched tensor. The server will chunk large coalesced groups into smaller runs of at most `LD_MAX_IMAGES_PER_GROUP` images (default: 256) to mitigate this. If you must allow larger single-call saves, set `LD_MAX_IMAGES_PER_SAVE` to a higher value in the server environment (e.g., `export LD_MAX_IMAGES_PER_SAVE=256`) but be mindful of disk usage. Alternatively, reduce `num_images` per job or lower `LD_MAX_BATCH_SIZE` to keep groups smaller.
	- Health checks: `/health` returns `{ "status": "ok" }`. If it fails, the FastAPI app likely crashed—restart and inspect `logs/server.log`.

	## Docker-specific notes

	- Always build with the provided `Dockerfile` to get SageAttention patches precompiled.
	- Forward model assets by mounting `./include` into the container (`-v $(pwd)/include:/app/include`).
	- On Windows + WSL2, ensure the WSL distro has the NVIDIA driver bridge (`wsl --status`).

	## Logging & diagnostics

	- Server logs live under `logs/server.log` with per-request IDs. Tail them during load testing: `tail -f logs/server.log`.
	- Enable debug logging by exporting `LD_SERVER_LOGLEVEL=DEBUG` before launching Streamlit/Gradio/uvicorn.
	- To inspect queue depth without hitting the API, watch the `GenerationBuffer` logs; each batch prints signature summaries.

	## When all else fails

	- Clear the `include/last_seed.txt` file if seed reuse behaves unexpectedly.
	- Regenerate Stable-Fast kernels by deleting the cache directory and re-running with `stable_fast` enabled.
	- Collect the following before opening an issue: GPU model, driver version, operating system, a copy of `logs/server.log`, hardware info from `/api/telemetry`, and reproduction steps.