MickJ commited on
Commit
af1b697
·
verified ·
1 Parent(s): 3965cee

Add SGLang serving instructions

Browse files

Add SGLang-Diffusion model card examples and tags for Cosmos3 serving.

Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -10,6 +10,8 @@ tags:
10
  - cosmos3
11
  - vllm
12
  - vllm-omni
 
 
13
  - diffusers
14
  - text, image, video, audio, and action generation
15
  - omnimodel
@@ -854,6 +856,67 @@ Example output from the command above:
854
  4. Place the flower into the red bottle.
855
  ```
856
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
857
  ### Diffusers
858
 
859
  #### Container
 
10
  - cosmos3
11
  - vllm
12
  - vllm-omni
13
+ - sglang
14
+ - sglang-diffusion
15
  - diffusers
16
  - text, image, video, audio, and action generation
17
  - omnimodel
 
856
  4. Place the flower into the red bottle.
857
  ```
858
 
859
+ ### SGLang
860
+
861
+ SGLang-Diffusion can serve `nvidia/Cosmos3-Super` through OpenAI-compatible image and video endpoints. Install SGLang from source with diffusion dependencies, then start the server:
862
+
863
+ ```bash
864
+ git clone https://github.com/sgl-project/sglang.git
865
+ cd sglang
866
+ pip install -e "python[diffusion]"
867
+ pip install "cosmos-guardrail==0.3.1"
868
+
869
+ sglang serve \
870
+ --model-path nvidia/Cosmos3-Super \
871
+ --num-gpus 4
872
+ ```
873
+
874
+ For the video-specialized checkpoint:
875
+
876
+ ```bash
877
+ sglang serve \
878
+ --model-path nvidia/Cosmos3-Super-Image2Video \
879
+ --num-gpus 4
880
+ ```
881
+
882
+ Supported SGLang endpoints:
883
+
884
+ | Mode | Endpoint | Notes |
885
+ | --- | --- | --- |
886
+ | Text to image | `POST /v1/images/generations` | Returns base64 image data by default |
887
+ | Text to video | `POST /v1/videos` | Creates an async job; poll `GET /v1/videos/{id}` and download `/content` |
888
+ | Image to video | `POST /v1/videos` | Upload the conditioning image with `input_reference` |
889
+
890
+ Example text-to-video request:
891
+
892
+ ```bash
893
+ job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
894
+ --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
895
+ --form-string "negative_prompt=blurry, distorted, low quality" \
896
+ --form-string "size=1280x720" \
897
+ --form-string "num_frames=81" \
898
+ --form-string "fps=24" \
899
+ --form-string "num_inference_steps=35" \
900
+ --form-string "guidance_scale=4.0" \
901
+ --form-string "flow_shift=10.0" \
902
+ --form-string "seed=42" \
903
+ --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
904
+ | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
905
+
906
+ while true; do
907
+ status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
908
+ | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
909
+ [ "$status" = "completed" ] && break
910
+ [ "$status" = "failed" ] && exit 1
911
+ sleep 1
912
+ done
913
+
914
+ curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
915
+ -o cosmos3_super_t2v_output.mp4
916
+ ```
917
+
918
+ Video-to-video, video-with-sound, and action generation are not supported by SGLang yet.
919
+
920
  ### Diffusers
921
 
922
  #### Container