MickJ commited on
Commit
45d02a2
·
verified ·
1 Parent(s): 3965cee

Add SGLang serving instructions

Browse files

Add SGLang-Diffusion model card examples and tags for Cosmos3 serving.

Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -10,6 +10,8 @@ tags:
10
  - cosmos3
11
  - vllm
12
  - vllm-omni
 
 
13
  - diffusers
14
  - text, image, video, audio, and action generation
15
  - omnimodel
@@ -927,6 +929,67 @@ Cosmos3 may produce imperfect outputs in challenging scenarios. Generation artif
927
 
928
  Cosmos3 outputs should not be treated as physically accurate simulation, reliable ground-truth reasoning, or safety-certified decision making. Applications involving robotics control, autonomous systems, scientific simulation, or safety-critical planning require additional validation, external constraints, system-level safety analysis, and domain-specific guardrails before deployment.
929
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
930
  ## Inference
931
 
932
  **Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
 
10
  - cosmos3
11
  - vllm
12
  - vllm-omni
13
+ - sglang
14
+ - sglang-diffusion
15
  - diffusers
16
  - text, image, video, audio, and action generation
17
  - omnimodel
 
929
 
930
  Cosmos3 outputs should not be treated as physically accurate simulation, reliable ground-truth reasoning, or safety-certified decision making. Applications involving robotics control, autonomous systems, scientific simulation, or safety-critical planning require additional validation, external constraints, system-level safety analysis, and domain-specific guardrails before deployment.
931
 
932
+ ### SGLang
933
+
934
+ SGLang-Diffusion can serve `nvidia/Cosmos3-Super` through OpenAI-compatible image and video endpoints. Install SGLang from source with diffusion dependencies, then start the server:
935
+
936
+ ```bash
937
+ git clone https://github.com/sgl-project/sglang.git
938
+ cd sglang
939
+ pip install -e "python[diffusion]"
940
+ pip install "cosmos-guardrail==0.3.1"
941
+
942
+ sglang serve \
943
+ --model-path nvidia/Cosmos3-Super \
944
+ --num-gpus 4
945
+ ```
946
+
947
+ For the video-specialized checkpoint:
948
+
949
+ ```bash
950
+ sglang serve \
951
+ --model-path nvidia/Cosmos3-Super-Image2Video \
952
+ --num-gpus 4
953
+ ```
954
+
955
+ Supported SGLang endpoints:
956
+
957
+ | Mode | Endpoint | Notes |
958
+ | --- | --- | --- |
959
+ | Text to image | `POST /v1/images/generations` | Returns base64 image data by default |
960
+ | Text to video | `POST /v1/videos` | Creates an async job; poll `GET /v1/videos/{id}` and download `/content` |
961
+ | Image to video | `POST /v1/videos` | Upload the conditioning image with `input_reference` |
962
+
963
+ Example text-to-video request:
964
+
965
+ ```bash
966
+ job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
967
+ --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
968
+ --form-string "negative_prompt=blurry, distorted, low quality" \
969
+ --form-string "size=1280x720" \
970
+ --form-string "num_frames=81" \
971
+ --form-string "fps=24" \
972
+ --form-string "num_inference_steps=35" \
973
+ --form-string "guidance_scale=4.0" \
974
+ --form-string "flow_shift=10.0" \
975
+ --form-string "seed=42" \
976
+ --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
977
+ | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
978
+
979
+ while true; do
980
+ status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
981
+ | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
982
+ [ "$status" = "completed" ] && break
983
+ [ "$status" = "failed" ] && exit 1
984
+ sleep 1
985
+ done
986
+
987
+ curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
988
+ -o cosmos3_super_t2v_output.mp4
989
+ ```
990
+
991
+ Video-to-video, video-with-sound, and action generation are not supported by SGLang yet.
992
+
993
  ## Inference
994
 
995
  **Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)