nvidia
/

Cosmos3-Nano

text, image, video, audio, and action generation

Model card Files Files and versions

MickJ commited on 1 day ago

Commit

cfa6ad7

·

1 Parent(s): 24d4790

Use async SGLang video API

Files changed (1) hide show

README.md +16 -4

README.md CHANGED Viewed

@@ -928,6 +928,8 @@ Cosmos3 outputs should not be treated as physically accurate simulation, reliabl
 [SGLang Diffusion](https://github.com/sgl-project/sglang) can serve Cosmos3-Nano through OpenAI-compatible image and video endpoints. Install SGLang from source with diffusion dependencies, then start a server:
 ```shell
 pip install -e "python[diffusion]"
 pip install "cosmos-guardrail==0.3.1"
@@ -945,14 +947,13 @@ Supported SGLang endpoints:
 | Mode | Endpoint | Notes |
 | --- | --- | --- |
 | Text to image | `POST /v1/images/generations` | Returns base64 image data by default |
-| Text to video | `POST /v1/videos/sync` or `POST /v1/videos` | `/sync` blocks and returns MP4 bytes with `Accept: video/mp4` |
-| Image to video | `POST /v1/videos/sync` or `POST /v1/videos` | Upload the conditioning image with `input_reference` |
 Example text-to-video request:
 ```shell
-curl -sS -X POST http://localhost:8000/v1/videos/sync \
-  -H "Accept: video/mp4" \
   --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
   --form-string "negative_prompt=blurry, distorted, low quality" \
   --form-string "size=1280x720" \
@@ -963,6 +964,17 @@ curl -sS -X POST http://localhost:8000/v1/videos/sync \
   --form-string "flow_shift=10.0" \
   --form-string "seed=42" \
   --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
   -o cosmos3_t2v_output.mp4
 ```

 [SGLang Diffusion](https://github.com/sgl-project/sglang) can serve Cosmos3-Nano through OpenAI-compatible image and video endpoints. Install SGLang from source with diffusion dependencies, then start a server:
 ```shell
+git clone https://github.com/sgl-project/sglang.git
+cd sglang
 pip install -e "python[diffusion]"
 pip install "cosmos-guardrail==0.3.1"
 | Mode | Endpoint | Notes |
 | --- | --- | --- |
 | Text to image | `POST /v1/images/generations` | Returns base64 image data by default |
+| Text to video | `POST /v1/videos` | Creates an async job; poll `GET /v1/videos/{id}` and download `/content` |
+| Image to video | `POST /v1/videos` | Upload the conditioning image with `input_reference` |
 Example text-to-video request:
 ```shell
+job_id=$(curl -sS -X POST http://localhost:8000/v1/videos \
   --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
   --form-string "negative_prompt=blurry, distorted, low quality" \
   --form-string "size=1280x720" \
   --form-string "flow_shift=10.0" \
   --form-string "seed=42" \
   --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
+  | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
+while true; do
+  status=$(curl -sS "http://localhost:8000/v1/videos/${job_id}" \
+    | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
+  [ "$status" = "completed" ] && break
+  [ "$status" = "failed" ] && exit 1
+  sleep 1
+done
+curl -sS -L "http://localhost:8000/v1/videos/${job_id}/content" \
   -o cosmos3_t2v_output.mp4
 ```