Cosmos
Diffusers
Safetensors
cosmos3_omni
nvidia
cosmos3
vllm
vllm-omni
sglang
sglang-diffusion
text, image, video, audio, and action generation
omnimodel
Instructions to use nvidia/Cosmos3-Super with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use nvidia/Cosmos3-Super with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Diffusers
How to use nvidia/Cosmos3-Super with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/Cosmos3-Super", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Add SGLang serving instructions
Browse filesAdd SGLang-Diffusion model card examples and tags for Cosmos3 serving.
README.md
CHANGED
|
@@ -10,6 +10,8 @@ tags:
|
|
| 10 |
- cosmos3
|
| 11 |
- vllm
|
| 12 |
- vllm-omni
|
|
|
|
|
|
|
| 13 |
- diffusers
|
| 14 |
- text, image, video, audio, and action generation
|
| 15 |
- omnimodel
|
|
@@ -854,6 +856,67 @@ Example output from the command above:
|
|
| 854 |
4. Place the flower into the red bottle.
|
| 855 |
```
|
| 856 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 857 |
### Diffusers
|
| 858 |
|
| 859 |
#### Container
|
|
|
|
| 10 |
- cosmos3
|
| 11 |
- vllm
|
| 12 |
- vllm-omni
|
| 13 |
+
- sglang
|
| 14 |
+
- sglang-diffusion
|
| 15 |
- diffusers
|
| 16 |
- text, image, video, audio, and action generation
|
| 17 |
- omnimodel
|
|
|
|
| 856 |
4. Place the flower into the red bottle.
|
| 857 |
```
|
| 858 |
|
| 859 |
+
### SGLang
|
| 860 |
+
|
| 861 |
+
SGLang-Diffusion can serve `nvidia/Cosmos3-Super` through OpenAI-compatible image and video endpoints. Install SGLang from source with diffusion dependencies, then start the server:
|
| 862 |
+
|
| 863 |
+
```bash
|
| 864 |
+
git clone https://github.com/sgl-project/sglang.git
|
| 865 |
+
cd sglang
|
| 866 |
+
pip install -e "python[diffusion]"
|
| 867 |
+
pip install "cosmos-guardrail==0.3.1"
|
| 868 |
+
|
| 869 |
+
sglang serve \
|
| 870 |
+
--model-path nvidia/Cosmos3-Super \
|
| 871 |
+
--num-gpus 4
|
| 872 |
+
```
|
| 873 |
+
|
| 874 |
+
For the video-specialized checkpoint:
|
| 875 |
+
|
| 876 |
+
```bash
|
| 877 |
+
sglang serve \
|
| 878 |
+
--model-path nvidia/Cosmos3-Super-Image2Video \
|
| 879 |
+
--num-gpus 4
|
| 880 |
+
```
|
| 881 |
+
|
| 882 |
+
Supported SGLang endpoints:
|
| 883 |
+
|
| 884 |
+
| Mode | Endpoint | Notes |
|
| 885 |
+
| --- | --- | --- |
|
| 886 |
+
| Text to image | `POST /v1/images/generations` | Returns base64 image data by default |
|
| 887 |
+
| Text to video | `POST /v1/videos` | Creates an async job; poll `GET /v1/videos/{id}` and download `/content` |
|
| 888 |
+
| Image to video | `POST /v1/videos` | Upload the conditioning image with `input_reference` |
|
| 889 |
+
|
| 890 |
+
Example text-to-video request:
|
| 891 |
+
|
| 892 |
+
```bash
|
| 893 |
+
job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
|
| 894 |
+
--form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
|
| 895 |
+
--form-string "negative_prompt=blurry, distorted, low quality" \
|
| 896 |
+
--form-string "size=1280x720" \
|
| 897 |
+
--form-string "num_frames=81" \
|
| 898 |
+
--form-string "fps=24" \
|
| 899 |
+
--form-string "num_inference_steps=35" \
|
| 900 |
+
--form-string "guidance_scale=4.0" \
|
| 901 |
+
--form-string "flow_shift=10.0" \
|
| 902 |
+
--form-string "seed=42" \
|
| 903 |
+
--form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
|
| 904 |
+
| python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
|
| 905 |
+
|
| 906 |
+
while true; do
|
| 907 |
+
status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
|
| 908 |
+
| python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
|
| 909 |
+
[ "$status" = "completed" ] && break
|
| 910 |
+
[ "$status" = "failed" ] && exit 1
|
| 911 |
+
sleep 1
|
| 912 |
+
done
|
| 913 |
+
|
| 914 |
+
curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
|
| 915 |
+
-o cosmos3_super_t2v_output.mp4
|
| 916 |
+
```
|
| 917 |
+
|
| 918 |
+
Video-to-video, video-with-sound, and action generation are not supported by SGLang yet.
|
| 919 |
+
|
| 920 |
### Diffusers
|
| 921 |
|
| 922 |
#### Container
|