Instructions to use WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
Cosmos3-Super-Text2Image SDNQ INT8 Transformer
This repository contains a transformer-only SDNQ quantization for nvidia/Cosmos3-Super-Text2Image.
It does not repeat the original model card. Read NVIDIA's model card, prompt-format guidance, license, and safety notes here: nvidia/Cosmos3-Super-Text2Image.
Only transformer/ is provided as a weight artifact. The VAE, scheduler, tokenizers, safety checker, and other components are loaded from the base model.
The quantization format comes from Disty0/sdnq. SD.Next's quantization overview is here: vladmandic/sdnext Quantization.
Recipe
| Setting | Value |
|---|---|
| Weights dtype | int8 |
| Static quantization | True |
| Dynamic quantization | False |
| SVD | False |
| SVD rank / steps | 32 / 8 |
| Quantized matmul | True |
| Dequantize FP32 | True |
| Quantized conv / embedding | False / False |
Quantization run: 20.90s; save time: 85.31s; transformer safetensors: 61.17 GiB.
Assemble The Pipeline
import json
import torch
from diffusers import Cosmos3OmniPipeline, Cosmos3OmniTransformer
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from huggingface_hub import snapshot_download
from sdnq.loader import load_sdnq_model
snapshot_path = snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer")
transformer = load_sdnq_model(
f"{snapshot_path}/transformer",
model_cls=Cosmos3OmniTransformer,
dtype=torch.bfloat16,
device=torch.device("cuda"),
dequantize_fp32=True,
use_quantized_matmul=True,
)
pipe = Cosmos3OmniPipeline.from_pretrained(
"nvidia/Cosmos3-Super-Text2Image",
transformer=transformer,
torch_dtype=torch.bfloat16,
device_map="cuda",
enable_safety_checker=True,
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=3.0)
pipe.to("cuda")
json_caption = {
"subjects": [],
"background_setting": "A concise scene description.",
"comprehensive_t2i_caption": "A detailed natural-language caption.",
"resolution": {"H": 1024, "W": 1024},
"aspect_ratio": "1,1",
}
result = pipe(
prompt=json.dumps(json_caption),
negative_prompt="",
num_frames=1,
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(1143),
)
result.video[0].save("cosmos3_sdnq_int8.png")
load_sdnq_model expects a local path. Download this repository first, or use huggingface_hub.snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer") and pass snapshot_path + "/transformer".
Benchmarks
Measured on one RunPod NVIDIA B200 instance with local container storage, cached model files, PyTorch 2.9.1+cu130, 1024x1024 image generation, 50 inference steps, guidance scale 4.0, flow_shift=3.0, system prompt enabled.
Transformer Component Load
| Variant | Load to CUDA | VRAM after load | Torch allocated | Torch reserved | Transformer safetensors |
|---|---|---|---|---|---|
| BF16 base transformer | 22.87s | 122,760 MiB | 122,121 MiB | 122,132 MiB | 119.21 GiB |
| SDNQ INT8 transformer | 16.50s | 63,920 MiB | 63,018 MiB | 63,200 MiB | 61.17 GiB |
Full Pipeline Generation
The stress set is ten handwritten JSON-caption prompts designed to stress Cyrillic text, reflections, multi-object composition, anatomy, small details, and scene-following.
| Variant | Full pipeline load | VRAM after load | Torch allocated after load | Avg generation time | Min / max generation time | Peak sampled VRAM | Images |
|---|---|---|---|---|---|---|---|
| BF16 base pipeline | 31.31s | 125,134 MiB | 124,386 MiB | 16.05s | 15.51s / 17.97s | 141,104 MiB | 10 |
| SDNQ INT8 pipeline | 26.79s | 67,268 MiB | 66,528 MiB | 25.51s | 21.57s / 36.53s | 83,202 MiB | 10 |
Original NVIDIA Example Caption
The original model repository provides assets/example_caption.json. The images below are generated locally with the same JSON-caption, seed 1143, 1024x1024, 50 steps, guidance scale 4.0.
| Variant | Pipeline load | Generation time | Peak sampled VRAM |
|---|---|---|---|
| BF16 base pipeline | 35.41s | 18.01s | 141,098 MiB |
| SDNQ INT8 pipeline | 25.79s | 66.05s | 83,218 MiB |
BF16 reference output:
SDNQ INT8 output:
Stress Prompt Examples
The following ten images use the same handwritten stress prompt set and seeds as the benchmark table.
Notes
This repository is an independent transformer-only quantization artifact. NVIDIA's original card states that Cosmos3-Super-Text2Image was tested in BF16; this SDNQ artifact should be treated as an experimental deployment variant and evaluated for each workload.
- Downloads last month
- -
Model tree for WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer
Base model
nvidia/Cosmos3-Super-Text2Image










