Add SGLang serving instructions

#5
by MickJ - opened
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -9,6 +9,8 @@ tags:
9
  - cosmos
10
  - cosmos3
11
  - vllm-omni
 
 
12
  - diffusers
13
  - image-to-video
14
  - video-generation
@@ -412,6 +414,50 @@ python scripts/upsample_prompt.py \
412
  --output-path scripts/upsampled.json
413
  ```
414
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
415
  ### Diffusers
416
 
417
  Cosmos3 is fully supported within the popular HuggingFace Diffusers package. This integration makes it a supported inference backend, allowing developers to easily incorporate Cosmos3's capabilities - such as text-to-image generation - into their pipelines using the Cosmos3OmniPipeline class, as demonstrated by the provided code examples (see examples for other modalities on the HuggingFace Cosmos3 page).
 
9
  - cosmos
10
  - cosmos3
11
  - vllm-omni
12
+ - sglang
13
+ - sglang-diffusion
14
  - diffusers
15
  - image-to-video
16
  - video-generation
 
414
  --output-path scripts/upsampled.json
415
  ```
416
 
417
+ ### SGLang
418
+
419
+ SGLang-Diffusion can serve `nvidia/Cosmos3-Super-Image2Video` through the OpenAI-compatible async video endpoint. Install SGLang from source with diffusion dependencies, then start the server:
420
+
421
+ ```bash
422
+ git clone https://github.com/sgl-project/sglang.git
423
+ cd sglang
424
+ pip install -e "python[diffusion]"
425
+ pip install "cosmos-guardrail==0.3.1"
426
+
427
+ sglang serve \
428
+ --model-path nvidia/Cosmos3-Super-Image2Video \
429
+ --num-gpus 4
430
+ ```
431
+
432
+ Example image-to-video request:
433
+
434
+ ```bash
435
+ job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
436
+ --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
437
+ --form-string "negative_prompt=blurry, distorted, low quality" \
438
+ --form-string "size=1280x720" \
439
+ --form-string "num_frames=81" \
440
+ --form-string "fps=24" \
441
+ --form-string "num_inference_steps=35" \
442
+ --form-string "guidance_scale=4.0" \
443
+ --form-string "flow_shift=10.0" \
444
+ --form-string "seed=42" \
445
+ --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
446
+ -F "input_reference=@input.png" \
447
+ | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
448
+
449
+ while true; do
450
+ status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
451
+ | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
452
+ [ "$status" = "completed" ] && break
453
+ [ "$status" = "failed" ] && exit 1
454
+ sleep 1
455
+ done
456
+
457
+ curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
458
+ -o cosmos3_super_i2v_output.mp4
459
+ ```
460
+
461
  ### Diffusers
462
 
463
  Cosmos3 is fully supported within the popular HuggingFace Diffusers package. This integration makes it a supported inference backend, allowing developers to easily incorporate Cosmos3's capabilities - such as text-to-image generation - into their pipelines using the Cosmos3OmniPipeline class, as demonstrated by the provided code examples (see examples for other modalities on the HuggingFace Cosmos3 page).