Hanrui / sglang /docs /diffusion /api /openai_api.md

Add files using upload-large-folder tool

6268841 verified 24 days ago

12 kB

	# SGLang Diffusion OpenAI API

	The SGLang diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as LoRA adapter management.

	## Prerequisites

	- Python 3.11+ if you plan to use the OpenAI Python SDK.

	## Serve

	Launch the server using the `sglang serve` command.

	### Start the server

	```bash
	SERVER_ARGS=(
	--model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
	--text-encoder-cpu-offload
	--pin-cpu-memory
	--num-gpus 4
	--ulysses-degree=2
	--ring-degree=2
	--port 30010
	)

	sglang serve "${SERVER_ARGS[@]}"
	```

	- --model-path: Path to the model or model ID.
	- --port: HTTP port to listen on (default: `30000`).

	Get Model Information

	Endpoint: `GET /models`

	Returns information about the model served by this server, including model path, task type, pipeline configuration, and precision settings.

	Curl Example:

	```bash
	curl -sS -X GET "http://localhost:30010/models"
	```

	Response Example:

	```json
	{
	"model_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
	"task_type": "T2V",
	"pipeline_name": "wan_pipeline",
	"pipeline_class": "WanPipeline",
	"num_gpus": 4,
	"dit_precision": "bf16",
	"vae_precision": "fp16"
	}
	```

	---

	## Endpoints

	### Image Generation

	The server implements an OpenAI-compatible Images API under the `/v1/images` namespace.

	Create an image

	Endpoint: `POST /v1/images/generations`

	Python Example (b64_json response):

	```python
	import base64
	from openai import OpenAI

	client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

	img = client.images.generate(
	prompt="A calico cat playing a piano on stage",
	size="1024x1024",
	n=1,
	response_format="b64_json",
	)

	image_bytes = base64.b64decode(img.data[0].b64_json)
	with open("output.png", "wb") as f:
	f.write(image_bytes)
	```

	Curl Example:

	```bash
	curl -sS -X POST "http://localhost:30010/v1/images/generations" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer sk-proj-1234567890" \
	-d '{
	"prompt": "A calico cat playing a piano on stage",
	"size": "1024x1024",
	"n": 1,
	"response_format": "b64_json"
	}'
	```

	> Note
	> If `response_format=url` is used and cloud storage is not configured, the API returns
	> a relative URL like `/v1/images/<IMAGE_ID>/content`.

	Edit an image

	Endpoint: `POST /v1/images/edits`

	This endpoint accepts a multipart form upload with input images and a text prompt. The server can return either a base64-encoded image or a URL to download the image.

	Curl Example (b64_json response):

	```bash
	curl -sS -X POST "http://localhost:30010/v1/images/edits" \
	-H "Authorization: Bearer sk-proj-1234567890" \
	-F "image=@local_input_image.png" \
	-F "url=image_url.jpg" \
	-F "prompt=A calico cat playing a piano on stage" \
	-F "size=1024x1024" \
	-F "response_format=b64_json"
	```

	Curl Example (URL response):

	```bash
	curl -sS -X POST "http://localhost:30010/v1/images/edits" \
	-H "Authorization: Bearer sk-proj-1234567890" \
	-F "image=@local_input_image.png" \
	-F "url=image_url.jpg" \
	-F "prompt=A calico cat playing a piano on stage" \
	-F "size=1024x1024" \
	-F "response_format=url"
	```

	Download image content

	When `response_format=url` is used with `POST /v1/images/generations` or `POST /v1/images/edits`,
	the API returns a relative URL like `/v1/images/<IMAGE_ID>/content`.

	Endpoint: `GET /v1/images/{image_id}/content`

	Curl Example:

	```bash
	curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
	-H "Authorization: Bearer sk-proj-1234567890" \
	-o output.png
	```

	### Video Generation

	The server implements a subset of the OpenAI Videos API under the `/v1/videos` namespace.

	Create a video

	Endpoint: `POST /v1/videos`

	Python Example:

	```python
	from openai import OpenAI

	client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

	video = client.videos.create(
	prompt="A calico cat playing a piano on stage",
	size="1280x720"
	)
	print(f"Video ID: {video.id}, Status: {video.status}")
	```

	Curl Example:

	```bash
	curl -sS -X POST "http://localhost:30010/v1/videos" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer sk-proj-1234567890" \
	-d '{
	"prompt": "A calico cat playing a piano on stage",
	"size": "1280x720"
	}'
	```

	List videos

	Endpoint: `GET /v1/videos`

	Python Example:

	```python
	videos = client.videos.list()
	for item in videos.data:
	print(item.id, item.status)
	```

	Curl Example:

	```bash
	curl -sS -X GET "http://localhost:30010/v1/videos" \
	-H "Authorization: Bearer sk-proj-1234567890"
	```

	Download video content

	Endpoint: `GET /v1/videos/{video_id}/content`

	Python Example:

	```python
	import time

	# Poll for completion
	while True:
	page = client.videos.list()
	item = next((v for v in page.data if v.id == video_id), None)
	if item and item.status == "completed":
	break
	time.sleep(5)

	# Download content
	resp = client.videos.download_content(video_id=video_id)
	with open("output.mp4", "wb") as f:
	f.write(resp.read())
	```

	Curl Example:

	```bash
	curl -sS -L "http://localhost:30010/v1/videos/<VIDEO_ID>/content" \
	-H "Authorization: Bearer sk-proj-1234567890" \
	-o output.mp4
	```

	---

	### LoRA Management

	The server supports dynamic loading, merging, and unmerging of LoRA adapters.

	Important Notes:
	- Mutual Exclusion: Only one LoRA can be merged (active) at a time
	- Switching: To switch LoRAs, you must first `unmerge` the current one, then `set` the new one
	- Caching: The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has little cost

	Set LoRA Adapter

	Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters.

	Endpoint: `POST /v1/set_lora`

	Parameters:
	- `lora_nickname` (string or list of strings, required): A unique identifier for the LoRA adapter(s). Can be a single string or a list of strings for multiple LoRAs
	- `lora_path` (string or list of strings/None, optional): Path to the `.safetensors` file(s) or Hugging Face repo ID(s). Required for the first load; optional if re-activating a cached nickname. If a list, must match the length of `lora_nickname`
	- `target` (string or list of strings, optional): Which transformer(s) to apply the LoRA to. If a list, must match the length of `lora_nickname`. Valid values:
	- `"all"` (default): Apply to all transformers
	- `"transformer"`: Apply only to the primary transformer (high noise for Wan2.2)
	- `"transformer_2"`: Apply only to transformer_2 (low noise for Wan2.2)
	- `"critic"`: Apply only to the critic model
	- `strength` (float or list of floats, optional): LoRA strength for merge, default 1.0. If a list, must match the length of `lora_nickname`. Values < 1.0 reduce the effect, values > 1.0 amplify the effect

	Single LoRA Example:

	```bash
	curl -X POST http://localhost:30010/v1/set_lora \
	-H "Content-Type: application/json" \
	-d '{
	"lora_nickname": "lora_name",
	"lora_path": "/path/to/lora.safetensors",
	"target": "all",
	"strength": 0.8
	}'
	```

	Multiple LoRA Example:

	```bash
	curl -X POST http://localhost:30010/v1/set_lora \
	-H "Content-Type: application/json" \
	-d '{
	"lora_nickname": ["lora_1", "lora_2"],
	"lora_path": ["/path/to/lora1.safetensors", "/path/to/lora2.safetensors"],
	"target": ["transformer", "transformer_2"],
	"strength": [0.8, 1.0]
	}'
	```

	Multiple LoRA with Same Target:

	```bash
	curl -X POST http://localhost:30010/v1/set_lora \
	-H "Content-Type: application/json" \
	-d '{
	"lora_nickname": ["style_lora", "character_lora"],
	"lora_path": ["/path/to/style.safetensors", "/path/to/character.safetensors"],
	"target": "all",
	"strength": [0.7, 0.9]
	}'
	```

	> [!NOTE]
	> When using multiple LoRAs:
	> - All list parameters (`lora_nickname`, `lora_path`, `target`, `strength`) must have the same length
	> - If `target` or `strength` is a single value, it will be applied to all LoRAs
	> - Multiple LoRAs applied to the same target will be merged in order


	Merge LoRA Weights

	Manually merges the currently set LoRA weights into the base model.

	> [!NOTE]
	> `set_lora` automatically performs a merge, so this is typically only needed if you have manually unmerged but want to re-apply the same LoRA without calling `set_lora` again.*

	Endpoint: `POST /v1/merge_lora_weights`

	Parameters:
	- `target` (string, optional): Which transformer(s) to merge. One of "all" (default), "transformer", "transformer_2", "critic"
	- `strength` (float, optional): LoRA strength for merge, default 1.0. Values < 1.0 reduce the effect, values > 1.0 amplify the effect

	Curl Example:

	```bash
	curl -X POST http://localhost:30010/v1/merge_lora_weights \
	-H "Content-Type: application/json" \
	-d '{"strength": 0.8}'
	```


	Unmerge LoRA Weights

	Unmerges the currently active LoRA weights from the base model, restoring it to its original state. This must be called before setting a different LoRA.

	Endpoint: `POST /v1/unmerge_lora_weights`

	Curl Example:

	```bash
	curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
	-H "Content-Type: application/json"
	```

	List LoRA Adapters

	Returns loaded LoRA adapters and current application status per module.

	Endpoint: `GET /v1/list_loras`

	Curl Example:

	```bash
	curl -sS -X GET "http://localhost:30010/v1/list_loras"
	```

	Response Example:

	```json
	{
	"loaded_adapters": [
	{ "nickname": "lora_a", "path": "/weights/lora_a.safetensors" },
	{ "nickname": "lora_b", "path": "/weights/lora_b.safetensors" }
	],
	"active": {
	"transformer": [
	{
	"nickname": "lora2",
	"path": "tarn59/pixel_art_style_lora_z_image_turbo",
	"merged": true,
	"strength": 1.0
	}
	]
	}
	}
	```

	Notes:
	- If LoRA is not enabled for the current pipeline, the server will return an error.
	- `num_lora_layers_with_weights` counts only layers that have LoRA weights applied for the active adapter.

	### Example: Switching LoRAs

	1. Set LoRA A:
	```bash
	curl -X POST http://localhost:30010/v1/set_lora -d '{"lora_nickname": "lora_a", "lora_path": "path/to/A"}'
	```
	2. Generate with LoRA A...
	3. Unmerge LoRA A:
	```bash
	curl -X POST http://localhost:30010/v1/unmerge_lora_weights
	```
	4. Set LoRA B:
	```bash
	curl -X POST http://localhost:30010/v1/set_lora -d '{"lora_nickname": "lora_b", "lora_path": "path/to/B"}'
	```
	5. Generate with LoRA B...

	### Adjust Output Quality

	The server supports adjusting output quality and compression levels for both image and video generation through the `output-quality` and `output-compression` parameters.

	#### Parameters

	- `output-quality` (string, optional): Preset quality level that automatically sets compression. Default is `"default"`. Valid values:
	- `"maximum"`: Highest quality (100)
	- `"high"`: High quality (90)
	- `"medium"`: Medium quality (55)
	- `"low"`: Lower quality (35)
	- `"default"`: Auto-adjust based on media type (50 for video, 75 for image)

	- `output-compression` (integer, optional): Direct compression level override (0-100). Default is `None`. When provided (not `None`), takes precedence over `output-quality`.
	- `0`: Lowest quality, smallest file size
	- `100`: Highest quality, largest file size

	#### Notes

	- Precedence: When both `output-quality` and `output-compression` are provided, `output-compression` takes precedence
	- Format Support: Quality settings apply to JPEG, and video formats. PNG uses lossless compression and ignores these settings
	- File Size vs Quality: Lower compression values (or "low" quality preset) produce smaller files but may show visible artifacts