Add SGLang serving instructions

#14
by MickJ - opened
Files changed (1) hide show
  1. README.md +62 -0
README.md CHANGED
@@ -10,6 +10,8 @@ tags:
10
  - cosmos3
11
  - vllm
12
  - vllm-omni
 
 
13
  - diffusers
14
  - text, image, video, audio, and action generation
15
  - omnimodel
@@ -922,6 +924,66 @@ Cosmos3 may produce imperfect outputs in challenging scenarios. Generation artif
922
 
923
  Cosmos3 outputs should not be treated as physically accurate simulation, reliable ground-truth reasoning, or safety-certified decision making. Applications involving robotics control, autonomous systems, scientific simulation, or safety-critical planning require additional validation, external constraints, system-level safety analysis, and domain-specific guardrails before deployment.
924
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
925
  ## Inference
926
 
927
  **Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
 
10
  - cosmos3
11
  - vllm
12
  - vllm-omni
13
+ - sglang
14
+ - sglang-diffusion
15
  - diffusers
16
  - text, image, video, audio, and action generation
17
  - omnimodel
 
924
 
925
  Cosmos3 outputs should not be treated as physically accurate simulation, reliable ground-truth reasoning, or safety-certified decision making. Applications involving robotics control, autonomous systems, scientific simulation, or safety-critical planning require additional validation, external constraints, system-level safety analysis, and domain-specific guardrails before deployment.
926
 
927
+
928
+ ## SGLang Serve
929
+
930
+ [SGLang Diffusion](https://github.com/sgl-project/sglang) can serve Cosmos3-Nano through OpenAI-compatible image and video endpoints. Install SGLang from source with diffusion dependencies, then start a server:
931
+
932
+ ```shell
933
+ git clone https://github.com/sgl-project/sglang.git
934
+ cd sglang
935
+ pip install -e "python[diffusion]"
936
+ pip install "cosmos-guardrail==0.3.1"
937
+
938
+ sglang serve --model-path nvidia/Cosmos3-Nano
939
+ ```
940
+
941
+ For a video-specialized checkpoint, use `Cosmos3-Super-Image2Video` with multiple GPUs:
942
+
943
+ ```shell
944
+ sglang serve \
945
+ --model-path nvidia/Cosmos3-Super-Image2Video \
946
+ --num-gpus 4
947
+ ```
948
+
949
+ Supported SGLang endpoints:
950
+
951
+ | Mode | Endpoint | Notes |
952
+ | --- | --- | --- |
953
+ | Text to image | `POST /v1/images/generations` | Returns base64 image data by default |
954
+ | Text to video | `POST /v1/videos` | Creates an async job; poll `GET /v1/videos/{id}` and download `/content` |
955
+ | Image to video | `POST /v1/videos` | Upload the conditioning image with `input_reference` |
956
+
957
+ Example text-to-video request:
958
+
959
+ ```shell
960
+ job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
961
+ --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
962
+ --form-string "negative_prompt=blurry, distorted, low quality" \
963
+ --form-string "size=1280x720" \
964
+ --form-string "num_frames=81" \
965
+ --form-string "fps=24" \
966
+ --form-string "num_inference_steps=35" \
967
+ --form-string "guidance_scale=4.0" \
968
+ --form-string "flow_shift=10.0" \
969
+ --form-string "seed=42" \
970
+ --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
971
+ | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
972
+
973
+ while true; do
974
+ status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
975
+ | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
976
+ [ "$status" = "completed" ] && break
977
+ [ "$status" = "failed" ] && exit 1
978
+ sleep 1
979
+ done
980
+
981
+ curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
982
+ -o cosmos3_t2v_output.mp4
983
+ ```
984
+
985
+ SGLang accepts Cosmos 3 request options including `max_sequence_length`, `flow_shift`, `extra_params.guardrails`, `extra_params.use_resolution_template`, and `extra_params.use_duration_template`. Video-to-video, video-with-sound, and action generation are not supported by SGLang yet.
986
+
987
  ## Inference
988
 
989
  **Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)