Hanrui / sglang /docs /diffusion /api /openai_api.md
Lekr0's picture
Add files using upload-large-folder tool
6268841 verified
# SGLang Diffusion OpenAI API
The SGLang diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as LoRA adapter management.
## Prerequisites
- Python 3.11+ if you plan to use the OpenAI Python SDK.
## Serve
Launch the server using the `sglang serve` command.
### Start the server
```bash
SERVER_ARGS=(
--model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
--text-encoder-cpu-offload
--pin-cpu-memory
--num-gpus 4
--ulysses-degree=2
--ring-degree=2
--port 30010
)
sglang serve "${SERVER_ARGS[@]}"
```
- **--model-path**: Path to the model or model ID.
- **--port**: HTTP port to listen on (default: `30000`).
**Get Model Information**
**Endpoint:** `GET /models`
Returns information about the model served by this server, including model path, task type, pipeline configuration, and precision settings.
**Curl Example:**
```bash
curl -sS -X GET "http://localhost:30010/models"
```
**Response Example:**
```json
{
"model_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
"task_type": "T2V",
"pipeline_name": "wan_pipeline",
"pipeline_class": "WanPipeline",
"num_gpus": 4,
"dit_precision": "bf16",
"vae_precision": "fp16"
}
```
---
## Endpoints
### Image Generation
The server implements an OpenAI-compatible Images API under the `/v1/images` namespace.
**Create an image**
**Endpoint:** `POST /v1/images/generations`
**Python Example (b64_json response):**
```python
import base64
from openai import OpenAI
client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")
img = client.images.generate(
prompt="A calico cat playing a piano on stage",
size="1024x1024",
n=1,
response_format="b64_json",
)
image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
f.write(image_bytes)
```
**Curl Example:**
```bash
curl -sS -X POST "http://localhost:30010/v1/images/generations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-proj-1234567890" \
-d '{
"prompt": "A calico cat playing a piano on stage",
"size": "1024x1024",
"n": 1,
"response_format": "b64_json"
}'
```
> **Note**
> If `response_format=url` is used and cloud storage is not configured, the API returns
> a relative URL like `/v1/images/<IMAGE_ID>/content`.
**Edit an image**
**Endpoint:** `POST /v1/images/edits`
This endpoint accepts a multipart form upload with input images and a text prompt. The server can return either a base64-encoded image or a URL to download the image.
**Curl Example (b64_json response):**
```bash
curl -sS -X POST "http://localhost:30010/v1/images/edits" \
-H "Authorization: Bearer sk-proj-1234567890" \
-F "image=@local_input_image.png" \
-F "url=image_url.jpg" \
-F "prompt=A calico cat playing a piano on stage" \
-F "size=1024x1024" \
-F "response_format=b64_json"
```
**Curl Example (URL response):**
```bash
curl -sS -X POST "http://localhost:30010/v1/images/edits" \
-H "Authorization: Bearer sk-proj-1234567890" \
-F "image=@local_input_image.png" \
-F "url=image_url.jpg" \
-F "prompt=A calico cat playing a piano on stage" \
-F "size=1024x1024" \
-F "response_format=url"
```
**Download image content**
When `response_format=url` is used with `POST /v1/images/generations` or `POST /v1/images/edits`,
the API returns a relative URL like `/v1/images/<IMAGE_ID>/content`.
**Endpoint:** `GET /v1/images/{image_id}/content`
**Curl Example:**
```bash
curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
-H "Authorization: Bearer sk-proj-1234567890" \
-o output.png
```
### Video Generation
The server implements a subset of the OpenAI Videos API under the `/v1/videos` namespace.
**Create a video**
**Endpoint:** `POST /v1/videos`
**Python Example:**
```python
from openai import OpenAI
client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")
video = client.videos.create(
prompt="A calico cat playing a piano on stage",
size="1280x720"
)
print(f"Video ID: {video.id}, Status: {video.status}")
```
**Curl Example:**
```bash
curl -sS -X POST "http://localhost:30010/v1/videos" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-proj-1234567890" \
-d '{
"prompt": "A calico cat playing a piano on stage",
"size": "1280x720"
}'
```
**List videos**
**Endpoint:** `GET /v1/videos`
**Python Example:**
```python
videos = client.videos.list()
for item in videos.data:
print(item.id, item.status)
```
**Curl Example:**
```bash
curl -sS -X GET "http://localhost:30010/v1/videos" \
-H "Authorization: Bearer sk-proj-1234567890"
```
**Download video content**
**Endpoint:** `GET /v1/videos/{video_id}/content`
**Python Example:**
```python
import time
# Poll for completion
while True:
page = client.videos.list()
item = next((v for v in page.data if v.id == video_id), None)
if item and item.status == "completed":
break
time.sleep(5)
# Download content
resp = client.videos.download_content(video_id=video_id)
with open("output.mp4", "wb") as f:
f.write(resp.read())
```
**Curl Example:**
```bash
curl -sS -L "http://localhost:30010/v1/videos/<VIDEO_ID>/content" \
-H "Authorization: Bearer sk-proj-1234567890" \
-o output.mp4
```
---
### LoRA Management
The server supports dynamic loading, merging, and unmerging of LoRA adapters.
**Important Notes:**
- Mutual Exclusion: Only one LoRA can be *merged* (active) at a time
- Switching: To switch LoRAs, you must first `unmerge` the current one, then `set` the new one
- Caching: The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has little cost
**Set LoRA Adapter**
Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters.
**Endpoint:** `POST /v1/set_lora`
**Parameters:**
- `lora_nickname` (string or list of strings, required): A unique identifier for the LoRA adapter(s). Can be a single string or a list of strings for multiple LoRAs
- `lora_path` (string or list of strings/None, optional): Path to the `.safetensors` file(s) or Hugging Face repo ID(s). Required for the first load; optional if re-activating a cached nickname. If a list, must match the length of `lora_nickname`
- `target` (string or list of strings, optional): Which transformer(s) to apply the LoRA to. If a list, must match the length of `lora_nickname`. Valid values:
- `"all"` (default): Apply to all transformers
- `"transformer"`: Apply only to the primary transformer (high noise for Wan2.2)
- `"transformer_2"`: Apply only to transformer_2 (low noise for Wan2.2)
- `"critic"`: Apply only to the critic model
- `strength` (float or list of floats, optional): LoRA strength for merge, default 1.0. If a list, must match the length of `lora_nickname`. Values < 1.0 reduce the effect, values > 1.0 amplify the effect
**Single LoRA Example:**
```bash
curl -X POST http://localhost:30010/v1/set_lora \
-H "Content-Type: application/json" \
-d '{
"lora_nickname": "lora_name",
"lora_path": "/path/to/lora.safetensors",
"target": "all",
"strength": 0.8
}'
```
**Multiple LoRA Example:**
```bash
curl -X POST http://localhost:30010/v1/set_lora \
-H "Content-Type: application/json" \
-d '{
"lora_nickname": ["lora_1", "lora_2"],
"lora_path": ["/path/to/lora1.safetensors", "/path/to/lora2.safetensors"],
"target": ["transformer", "transformer_2"],
"strength": [0.8, 1.0]
}'
```
**Multiple LoRA with Same Target:**
```bash
curl -X POST http://localhost:30010/v1/set_lora \
-H "Content-Type: application/json" \
-d '{
"lora_nickname": ["style_lora", "character_lora"],
"lora_path": ["/path/to/style.safetensors", "/path/to/character.safetensors"],
"target": "all",
"strength": [0.7, 0.9]
}'
```
> [!NOTE]
> When using multiple LoRAs:
> - All list parameters (`lora_nickname`, `lora_path`, `target`, `strength`) must have the same length
> - If `target` or `strength` is a single value, it will be applied to all LoRAs
> - Multiple LoRAs applied to the same target will be merged in order
**Merge LoRA Weights**
Manually merges the currently set LoRA weights into the base model.
> [!NOTE]
> `set_lora` automatically performs a merge, so this is typically only needed if you have manually unmerged but want to re-apply the same LoRA without calling `set_lora` again.*
**Endpoint:** `POST /v1/merge_lora_weights`
**Parameters:**
- `target` (string, optional): Which transformer(s) to merge. One of "all" (default), "transformer", "transformer_2", "critic"
- `strength` (float, optional): LoRA strength for merge, default 1.0. Values < 1.0 reduce the effect, values > 1.0 amplify the effect
**Curl Example:**
```bash
curl -X POST http://localhost:30010/v1/merge_lora_weights \
-H "Content-Type: application/json" \
-d '{"strength": 0.8}'
```
**Unmerge LoRA Weights**
Unmerges the currently active LoRA weights from the base model, restoring it to its original state. This **must** be called before setting a different LoRA.
**Endpoint:** `POST /v1/unmerge_lora_weights`
**Curl Example:**
```bash
curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
-H "Content-Type: application/json"
```
**List LoRA Adapters**
Returns loaded LoRA adapters and current application status per module.
**Endpoint:** `GET /v1/list_loras`
**Curl Example:**
```bash
curl -sS -X GET "http://localhost:30010/v1/list_loras"
```
**Response Example:**
```json
{
"loaded_adapters": [
{ "nickname": "lora_a", "path": "/weights/lora_a.safetensors" },
{ "nickname": "lora_b", "path": "/weights/lora_b.safetensors" }
],
"active": {
"transformer": [
{
"nickname": "lora2",
"path": "tarn59/pixel_art_style_lora_z_image_turbo",
"merged": true,
"strength": 1.0
}
]
}
}
```
Notes:
- If LoRA is not enabled for the current pipeline, the server will return an error.
- `num_lora_layers_with_weights` counts only layers that have LoRA weights applied for the active adapter.
### Example: Switching LoRAs
1. Set LoRA A:
```bash
curl -X POST http://localhost:30010/v1/set_lora -d '{"lora_nickname": "lora_a", "lora_path": "path/to/A"}'
```
2. Generate with LoRA A...
3. Unmerge LoRA A:
```bash
curl -X POST http://localhost:30010/v1/unmerge_lora_weights
```
4. Set LoRA B:
```bash
curl -X POST http://localhost:30010/v1/set_lora -d '{"lora_nickname": "lora_b", "lora_path": "path/to/B"}'
```
5. Generate with LoRA B...
### Adjust Output Quality
The server supports adjusting output quality and compression levels for both image and video generation through the `output-quality` and `output-compression` parameters.
#### Parameters
- **`output-quality`** (string, optional): Preset quality level that automatically sets compression. **Default is `"default"`**. Valid values:
- `"maximum"`: Highest quality (100)
- `"high"`: High quality (90)
- `"medium"`: Medium quality (55)
- `"low"`: Lower quality (35)
- `"default"`: Auto-adjust based on media type (50 for video, 75 for image)
- **`output-compression`** (integer, optional): Direct compression level override (0-100). **Default is `None`**. When provided (not `None`), takes precedence over `output-quality`.
- `0`: Lowest quality, smallest file size
- `100`: Highest quality, largest file size
#### Notes
- **Precedence**: When both `output-quality` and `output-compression` are provided, `output-compression` takes precedence
- **Format Support**: Quality settings apply to JPEG, and video formats. PNG uses lossless compression and ignores these settings
- **File Size vs Quality**: Lower compression values (or "low" quality preset) produce smaller files but may show visible artifacts