nvidia
/

Cosmos3-Super-Image2Video

@@ -9,54 +9,11 @@ tags:
   - cosmos
   - cosmos3
   - vllm-omni
   - diffusers
   - image-to-video
   - video-generation
-countDownloads:
-  - checkpoint.json
-  - config.json
-  - generation_config.json
-  - model.safetensors.index.json
-  - model_index.json
-  - tokenizer.json
-  - tokenizer_config.json
-  - sound_tokenizer/config.json
-  - sound_tokenizer/diffusion_pytorch_model.safetensors
-  - text_tokenizer/tokenizer.json
-  - text_tokenizer/tokenizer_config.json
-  - transformer/config.json
-  - transformer/diffusion_pytorch_model-00001-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00002-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00003-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00004-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00005-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00006-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00007-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00008-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00009-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00010-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00011-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00012-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00013-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00014-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00015-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00016-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00017-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00018-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00019-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00020-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00021-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00022-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00023-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00024-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00025-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00026-of-00027.safetensors
-  - transformer/diffusion_pytorch_model-00027-of-00027.safetensors
-  - transformer/diffusion_pytorch_model.safetensors.index.json
-  - vae/config.json
-  - vae/diffusion_pytorch_model.safetensors
-  - vision_encoder/config.json
-  - vision_encoder/model.safetensors
 ---
 # **Cosmos 3: Omnimodal World Models for Physical AI**
@@ -211,6 +168,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 - [PyTorch](https://github.com/nvidia/cosmos3)
 - [vLLM-Omni](https://github.com/vllm-project/vllm-omni)
 - [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/en/index)
 **Supported Hardware Microarchitecture Compatibility:**
@@ -457,6 +415,54 @@ python scripts/upsample_prompt.py \
     --output-path scripts/upsampled.json
 ```
 ### Diffusers
 Cosmos3 is fully supported within the popular HuggingFace Diffusers package. This integration makes it a supported inference backend, allowing developers to easily incorporate Cosmos3's capabilities - such as text-to-image generation - into their pipelines using the Cosmos3OmniPipeline class, as demonstrated by the provided code examples (see examples for other modalities on the HuggingFace Cosmos3 page).
@@ -535,7 +541,7 @@ Cosmos3 outputs should not be treated as physically accurate simulation, reliabl
 ## Inference
-**Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
 **Test Hardware:** GB200 and H100

   - cosmos
   - cosmos3
   - vllm-omni
+  - sglang
+  - sglang-diffusion
   - diffusers
   - image-to-video
   - video-generation
 ---
 # **Cosmos 3: Omnimodal World Models for Physical AI**
 - [PyTorch](https://github.com/nvidia/cosmos3)
 - [vLLM-Omni](https://github.com/vllm-project/vllm-omni)
 - [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/en/index)
+- [SGLang](https://github.com/sgl-project/sglang)
 **Supported Hardware Microarchitecture Compatibility:**
     --output-path scripts/upsampled.json
 ```
+### SGLang
+[SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/index) can serve `nvidia/Cosmos3-Super-Image2Video` through OpenAI-compatible video generation endpoints. Install SGLang from the main branch with diffusion dependencies, then start the server:
+```bash
+git clone --branch main https://github.com/sgl-project/sglang.git
+cd sglang
+pip install -e "python[diffusion]"
+pip install "cosmos-guardrail==0.3.1"
+sglang serve \
+  --model-path nvidia/Cosmos3-Super-Image2Video \
+  --num-gpus 4
+```
+Cosmos 3 support in SGLang Diffusion currently requires the SGLang main branch. Switch to a stable SGLang release once Cosmos 3 support is included there.
+Example image-to-video request:
+```bash
+job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
+  --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
+  --form-string "negative_prompt=blurry, distorted, low quality" \
+  --form-string "size=1280x720" \
+  --form-string "num_frames=81" \
+  --form-string "fps=24" \
+  --form-string "num_inference_steps=35" \
+  --form-string "guidance_scale=4.0" \
+  --form-string "flow_shift=10.0" \
+  --form-string "seed=42" \
+  --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
+  -F "input_reference=@input.png" \
+  | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
+while true; do
+  status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
+    | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
+  [ "$status" = "completed" ] && break
+  [ "$status" = "failed" ] && exit 1
+  sleep 1
+done
+curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
+  -o cosmos3_super_i2v_output.mp4
+```
+For complete serving instructions and request examples, see the [Cosmos3 SGLang cookbook](https://docs.sglang.io/cookbook/diffusion/Cosmos/Cosmos3).
 ### Diffusers
 Cosmos3 is fully supported within the popular HuggingFace Diffusers package. This integration makes it a supported inference backend, allowing developers to easily incorporate Cosmos3's capabilities - such as text-to-image generation - into their pipelines using the Cosmos3OmniPipeline class, as demonstrated by the provided code examples (see examples for other modalities on the HuggingFace Cosmos3 page).
 ## Inference
+**Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers), [SGLang](https://github.com/sgl-project/sglang), [SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/index)
 **Test Hardware:** GB200 and H100