nvidia
/

Cosmos3-Super-Text2Image

@@ -10,6 +10,8 @@ tags:
   - cosmos3
   - vllm-omni
   - diffusers
   - text-to-image
   - image-generation
 countDownloads:
@@ -211,6 +213,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 - [PyTorch](https://github.com/nvidia/cosmos3)
 - [vLLM-Omni](https://github.com/vllm-project/vllm-omni)
 - [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/en/index)
 **Supported Hardware Microarchitecture Compatibility:**
@@ -485,6 +488,12 @@ result.video[0].save("/tmp/cosmos3_t2i.png")
 print("Saved image to /tmp/cosmos3_t2i.png")
 ```
 ## Limitations
 Cosmos3 may produce imperfect outputs in challenging scenarios. Generation artifacts include temporal inconsistency, unstable camera or object motion, imprecise physical interactions, inaccurate audio-video synchronization, and action-state drift — especially in long-horizon or high-resolution outputs. Reasoning may also be incorrect: object states, causal relationships, spatial geometry, temporal ordering, agent intent, and future outcomes can be misinferred, and complex or long-context inputs may yield hallucinated entities, inconsistent interpretations, or implausible predictions. Because the model lacks an explicit physics simulator, 3D geometry, 4D space-time evolution, object permanence, contact dynamics, and physical laws are only approximated — producing artifacts such as disappearing or morphing objects, unrealistic collisions, and physically implausible motions. Quality further degrades in out-of-distribution environments, safety-critical edge cases, and domains underrepresented in training.
@@ -493,7 +502,7 @@ Cosmos3 outputs should not be treated as physically accurate simulation, reliabl
 ## Inference
-**Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
 **Test Hardware:** GB200 and H100

   - cosmos3
   - vllm-omni
   - diffusers
+  - sglang
+  - sglang-diffusion
   - text-to-image
   - image-generation
 countDownloads:
 - [PyTorch](https://github.com/nvidia/cosmos3)
 - [vLLM-Omni](https://github.com/vllm-project/vllm-omni)
 - [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/en/index)
+- [SGLang](https://sgl-project.github.io/)
 **Supported Hardware Microarchitecture Compatibility:**
 print("Saved image to /tmp/cosmos3_t2i.png")
 ```
+### SGLang
+[SGLang Diffusion](https://sgl-project.github.io/diffusion) can serve `nvidia/Cosmos3-Super-Text2Image` through OpenAI-compatible image generation endpoints.
+For complete serving instructions and request examples, see the [Cosmos3 SGLang cookbook](https://lmsysorg.mintlify.app/cookbook/diffusion/Cosmos/Cosmos3).
 ## Limitations
 Cosmos3 may produce imperfect outputs in challenging scenarios. Generation artifacts include temporal inconsistency, unstable camera or object motion, imprecise physical interactions, inaccurate audio-video synchronization, and action-state drift — especially in long-horizon or high-resolution outputs. Reasoning may also be incorrect: object states, causal relationships, spatial geometry, temporal ordering, agent intent, and future outcomes can be misinferred, and complex or long-context inputs may yield hallucinated entities, inconsistent interpretations, or implausible predictions. Because the model lacks an explicit physics simulator, 3D geometry, 4D space-time evolution, object permanence, contact dynamics, and physical laws are only approximated — producing artifacts such as disappearing or morphing objects, unrealistic collisions, and physically implausible motions. Quality further degrades in out-of-distribution environments, safety-critical edge cases, and domains underrepresented in training.
 ## Inference
+**Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers), [SGLang](https://sgl-project.github.io/), [SGLang Diffusion](https://sgl-project.github.io/diffusion)
 **Test Hardware:** GB200 and H100