Instructions to use nvidia/Cosmos3-Super-Image2Video with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use nvidia/Cosmos3-Super-Image2Video with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Diffusers
How to use nvidia/Cosmos3-Super-Image2Video with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/Cosmos3-Super-Image2Video", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Add SGLang to model card
#7
by majchrow - opened
README.md
CHANGED
|
@@ -10,6 +10,8 @@ tags:
|
|
| 10 |
- cosmos3
|
| 11 |
- vllm-omni
|
| 12 |
- diffusers
|
|
|
|
|
|
|
| 13 |
- image-to-video
|
| 14 |
- video-generation
|
| 15 |
countDownloads:
|
|
@@ -211,6 +213,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
|
|
| 211 |
- [PyTorch](https://github.com/nvidia/cosmos3)
|
| 212 |
- [vLLM-Omni](https://github.com/vllm-project/vllm-omni)
|
| 213 |
- [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/en/index)
|
|
|
|
| 214 |
|
| 215 |
**Supported Hardware Microarchitecture Compatibility:**
|
| 216 |
|
|
@@ -527,6 +530,12 @@ Example output generated by Diffusers:
|
|
| 527 |
|
| 528 |
<video controls width="832" height="480" src="https://huggingface.co/nvidia/Cosmos3-Super-Image2Video/resolve/main/assets/example_output_diffusers.mp4"></video>
|
| 529 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 530 |
## Limitations
|
| 531 |
|
| 532 |
Cosmos3 may produce imperfect outputs in challenging scenarios. Generation artifacts include temporal inconsistency, unstable camera or object motion, imprecise physical interactions, inaccurate audio-video synchronization, and action-state drift — especially in long-horizon or high-resolution outputs. Reasoning may also be incorrect: object states, causal relationships, spatial geometry, temporal ordering, agent intent, and future outcomes can be misinferred, and complex or long-context inputs may yield hallucinated entities, inconsistent interpretations, or implausible predictions. Because the model lacks an explicit physics simulator, 3D geometry, 4D space-time evolution, object permanence, contact dynamics, and physical laws are only approximated — producing artifacts such as disappearing or morphing objects, unrealistic collisions, and physically implausible motions. Quality further degrades in out-of-distribution environments, safety-critical edge cases, and domains underrepresented in training.
|
|
@@ -535,7 +544,7 @@ Cosmos3 outputs should not be treated as physically accurate simulation, reliabl
|
|
| 535 |
|
| 536 |
## Inference
|
| 537 |
|
| 538 |
-
**Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
|
| 539 |
|
| 540 |
**Test Hardware:** GB200 and H100
|
| 541 |
|
|
|
|
| 10 |
- cosmos3
|
| 11 |
- vllm-omni
|
| 12 |
- diffusers
|
| 13 |
+
- sglang
|
| 14 |
+
- sglang-diffusion
|
| 15 |
- image-to-video
|
| 16 |
- video-generation
|
| 17 |
countDownloads:
|
|
|
|
| 213 |
- [PyTorch](https://github.com/nvidia/cosmos3)
|
| 214 |
- [vLLM-Omni](https://github.com/vllm-project/vllm-omni)
|
| 215 |
- [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/en/index)
|
| 216 |
+
- [SGLang](https://sgl-project.github.io/)
|
| 217 |
|
| 218 |
**Supported Hardware Microarchitecture Compatibility:**
|
| 219 |
|
|
|
|
| 530 |
|
| 531 |
<video controls width="832" height="480" src="https://huggingface.co/nvidia/Cosmos3-Super-Image2Video/resolve/main/assets/example_output_diffusers.mp4"></video>
|
| 532 |
|
| 533 |
+
### SGLang
|
| 534 |
+
|
| 535 |
+
[SGLang Diffusion](https://sgl-project.github.io/diffusion) can serve `nvidia/Cosmos3-Super-Image2Video` through OpenAI-compatible video generation endpoints.
|
| 536 |
+
|
| 537 |
+
For complete serving instructions and request examples, see the [Cosmos3 SGLang cookbook](https://lmsysorg.mintlify.app/cookbook/diffusion/Cosmos/Cosmos3).
|
| 538 |
+
|
| 539 |
## Limitations
|
| 540 |
|
| 541 |
Cosmos3 may produce imperfect outputs in challenging scenarios. Generation artifacts include temporal inconsistency, unstable camera or object motion, imprecise physical interactions, inaccurate audio-video synchronization, and action-state drift — especially in long-horizon or high-resolution outputs. Reasoning may also be incorrect: object states, causal relationships, spatial geometry, temporal ordering, agent intent, and future outcomes can be misinferred, and complex or long-context inputs may yield hallucinated entities, inconsistent interpretations, or implausible predictions. Because the model lacks an explicit physics simulator, 3D geometry, 4D space-time evolution, object permanence, contact dynamics, and physical laws are only approximated — producing artifacts such as disappearing or morphing objects, unrealistic collisions, and physically implausible motions. Quality further degrades in out-of-distribution environments, safety-critical edge cases, and domains underrepresented in training.
|
|
|
|
| 544 |
|
| 545 |
## Inference
|
| 546 |
|
| 547 |
+
**Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers), [SGLang](https://sgl-project.github.io/), [SGLang Diffusion](https://sgl-project.github.io/diffusion)
|
| 548 |
|
| 549 |
**Test Hardware:** GB200 and H100
|
| 550 |
|