Aatricks's picture
Deploy ZeroGPU Gradio Space snapshot
b701455
# REST API & Automation (Quick Reference)
LightDiffusion-Next ships with a FastAPI service (`server.py`) that sits in front of the shared pipeline. It batches compatible requests, streams telemetry and exposes health probes so you can plug the system into automation workflows, bots or orchestrators.
## Common endpoints
| Method | Path | Description |
| --- | --- | --- |
| `GET` | `/health` | Lightweight readiness probe. Returns `{ "status": "ok" }` when the server is reachable. |
| `GET` | `/api/telemetry` | Queue and VRAM telemetry: batching stats, pending requests, cache state, uptime. |
| `POST` | `/api/generate` | Submit a generation job. Requests are buffered, batched when signatures match and resolved asynchronously. |
The service listens on port `7861` by default. Launch it with:
```fish
uvicorn server:app --host 0.0.0.0 --port 7861
```
## Payload schema (`/api/generate`)
```json
{
"prompt": "string",
"negative_prompt": "string",
"width": 512,
"height": 512,
"num_images": 1,
"batch_size": 1,
"scheduler": "ays",
"sampler": "dpmpp_sde_cfgpp",
"steps": 20,
"hires_fix": false,
"adetailer": false,
"enhance_prompt": false,
"img2img_enabled": false,
"img2img_image": null,
"stable_fast": false,
"reuse_seed": false,
"flux_enabled": false,
"realistic_model": false,
"multiscale_enabled": true,
"multiscale_intermittent": true,
"multiscale_factor": 0.5,
"multiscale_fullres_start": 10,
"multiscale_fullres_end": 8,
"keep_models_loaded": true,
"enable_preview": false,
"preview_fidelity": "balanced",
"guidance_scale": null,
"seed": null
}
```
Not all fields are required—only `prompt`, `width`, `height` and `num_images` are strictly necessary. Any unknown keys are ignored, making the endpoint forward-compatible with UI features.
### Response format
Successful requests return either:
```json
{ "image": "<base64-png>" }
```
or, if multiple images were requested:
```json
{ "images": ["<base64-png>", "<base64-png>"] }
```
Base64 strings represent PNG files with embedded metadata identical to the Streamlit UI output. Decode and write them to disk.
### Img2Img uploads
When `img2img_enabled` is `true`, `img2img_image` may be provided as any of the following:
- A local file path (e.g., `"tests/test.png"`)
- A data URL (e.g., `"data:image/png;base64,<...>"`)
- A raw Base64-encoded PNG string
The server will decode data URLs and raw Base64 strings and save them to the system temporary directory before processing (default max upload size: 10 MB). Keep payloads under a few megabytes to avoid HTTP timeouts.
## Telemetry shape (`/api/telemetry`)
The telemetry endpoint returns operational stats that help with autoscaling or queue dashboards. Example snippet:
```json
{
"uptime_seconds": 1234.56,
"pending_count": 2,
"pending_by_signature": {
"(False, 512, 512, True, False, False, True, True, 0.5, 10, 8, False, True, False)": 2
},
"pending_preview": [
{"request_id": "a1b2c3d4", "waiting_s": 0.42, "prompt_preview": "a cinematic robot..."}
],
"max_batch_size": 4,
"max_images_per_group": 256,
"batch_timeout": 0.5,
"batches_processed": 12,
"items_processed": 24,
"requests_processed": 12,
"avg_processed_wait_s": 0.31,
"pending_avg_wait_s": 0.12,
"memory_info": {
"vram_allocated_mb": 5623,
"vram_reserved_mb": 6144,
"system_ram_mb": 12345
},
"loaded_models_count": 2,
"loaded_models": ["SD15 UNet", "SD15 VAE"],
"pipeline_import_ok": true,
"pipeline_import_error": null
}
```
Use this data to spot batching mismatches (different signatures cannot be coalesced), monitor VRAM usage or expose metrics to Prometheus/Grafana.
## Queue tuning knobs
The queue accepts a few environment variables that influence behaviour:
| Variable | Default | Effect |
| --- | --- | --- |
| `LD_MAX_BATCH_SIZE` | `4` | Maximum items processed together when signatures match. |
| `LD_BATCH_TIMEOUT` | `0.5` | Seconds to wait before flushing a batch. |
| `LD_BATCH_WAIT_SINGLETONS` | `0` | If `1`, single jobs wait the timeout hoping for companions. Set to `0` to process singletons immediately. |
| `LD_MAX_IMAGES_PER_GROUP` | `256` | Maximum combined images processed in a single pipeline run when coalescing multiple requests. Groups larger than this are processed sequentially in smaller chunks to avoid memory and disk pressure. |
| `LD_MAX_IMAGES_PER_SAVE` | `16` | Maximum images allowed in a single `save_images` call. If exceeded, the save is aborted to avoid creating many tile files; change with `LD_MAX_IMAGES_PER_SAVE` if needed. |
| `LD_SERVER_LOGLEVEL` | `DEBUG` | Logging verbosity for `logs/server.log`. |
## Deploying behind a reverse proxy
When hosting remotely:
- Front the FastAPI app with Nginx/Caddy and increase client body size if you accept Img2Img uploads.
- Expose `/health` for liveness checks and `/api/telemetry` for readiness/autoscaling gates.
- Mount `./include`, `./output` and `~/.cache/torch_extensions` as volumes so workers share models, outputs and compiled kernels.
## Testing the service quickly
```fish
# Send a simple generation job
curl -X POST http://localhost:7861/api/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "painted nebula over distant mountains", "width": 512, "height": 512, "num_images": 1}' \
| jq -r '.image' | base64 -d > nebula.png
# Inspect queue state
curl http://localhost:7861/api/telemetry | jq
```
That’s it! Check the [Troubleshooting guide](quirks.md) if the service reports missing models or the queue appears stalled.