Instructions to use WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| base_model: nvidia/Cosmos3-Super-Text2Image | |
| library_name: diffusers | |
| pipeline_tag: text-to-image | |
| tags: | |
| - cosmos3 | |
| - diffusers | |
| - sdnq | |
| - text-to-image | |
| - int8 | |
| license: other | |
| license_name: openmdw1.1-license | |
| license_link: https://openmdw.ai/license/1-1/ | |
| # Cosmos3-Super-Text2Image SDNQ INT8 Transformer | |
| This repository contains a transformer-only SDNQ quantization for [nvidia/Cosmos3-Super-Text2Image](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image). | |
| It does not repeat the original model card. Read NVIDIA's model card, prompt-format guidance, license, and safety notes here: | |
| [nvidia/Cosmos3-Super-Text2Image](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image). | |
| Only `transformer/` is provided as a weight artifact. The VAE, scheduler, tokenizers, safety checker, and other components are loaded from the base model. | |
| The quantization format comes from [Disty0/sdnq](https://github.com/Disty0/sdnq). SD.Next's quantization overview is here: | |
| [vladmandic/sdnext Quantization](https://github.com/vladmandic/sdnext/wiki/Quantization). | |
| ## Recipe | |
| | Setting | Value | | |
| | --- | --- | | |
| | Weights dtype | `int8` | | |
| | Static quantization | `True` | | |
| | Dynamic quantization | `False` | | |
| | SVD | `False` | | |
| | SVD rank / steps | `32` / `8` | | |
| | Quantized matmul | `True` | | |
| | Dequantize FP32 | `True` | | |
| | Quantized conv / embedding | `False` / `False` | | |
| Quantization run: 20.90s; save time: 85.31s; transformer safetensors: 61.17 GiB. | |
| ## Assemble The Pipeline | |
| ```python | |
| import json | |
| import torch | |
| from diffusers import Cosmos3OmniPipeline, Cosmos3OmniTransformer | |
| from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler | |
| from huggingface_hub import snapshot_download | |
| from sdnq.loader import load_sdnq_model | |
| snapshot_path = snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer") | |
| transformer = load_sdnq_model( | |
| f"{snapshot_path}/transformer", | |
| model_cls=Cosmos3OmniTransformer, | |
| dtype=torch.bfloat16, | |
| device=torch.device("cuda"), | |
| dequantize_fp32=True, | |
| use_quantized_matmul=True, | |
| ) | |
| pipe = Cosmos3OmniPipeline.from_pretrained( | |
| "nvidia/Cosmos3-Super-Text2Image", | |
| transformer=transformer, | |
| torch_dtype=torch.bfloat16, | |
| device_map="cuda", | |
| enable_safety_checker=True, | |
| ) | |
| pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=3.0) | |
| pipe.to("cuda") | |
| json_caption = { | |
| "subjects": [], | |
| "background_setting": "A concise scene description.", | |
| "comprehensive_t2i_caption": "A detailed natural-language caption.", | |
| "resolution": {"H": 1024, "W": 1024}, | |
| "aspect_ratio": "1,1", | |
| } | |
| result = pipe( | |
| prompt=json.dumps(json_caption), | |
| negative_prompt="", | |
| num_frames=1, | |
| height=1024, | |
| width=1024, | |
| num_inference_steps=50, | |
| guidance_scale=4.0, | |
| generator=torch.Generator(device="cuda").manual_seed(1143), | |
| ) | |
| result.video[0].save("cosmos3_sdnq_int8.png") | |
| ``` | |
| `load_sdnq_model` expects a local path. Download this repository first, or use `huggingface_hub.snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer")` and pass `snapshot_path + "/transformer"`. | |
| ## Benchmarks | |
| Measured on one RunPod NVIDIA B200 instance with local container storage, cached model files, PyTorch `2.9.1+cu130`, 1024x1024 image generation, 50 inference steps, guidance scale 4.0, `flow_shift=3.0`, system prompt enabled. | |
| ### Transformer Component Load | |
| | Variant | Load to CUDA | VRAM after load | Torch allocated | Torch reserved | Transformer safetensors | | |
| | --- | ---: | ---: | ---: | ---: | ---: | | |
| | BF16 base transformer | 22.87s | 122,760 MiB | 122,121 MiB | 122,132 MiB | 119.21 GiB | | |
| | SDNQ INT8 transformer | 16.50s | 63,920 MiB | 63,018 MiB | 63,200 MiB | 61.17 GiB | | |
| ### Full Pipeline Generation | |
| The stress set is ten handwritten JSON-caption prompts designed to stress Cyrillic text, reflections, multi-object composition, anatomy, small details, and scene-following. | |
| | Variant | Full pipeline load | VRAM after load | Torch allocated after load | Avg generation time | Min / max generation time | Peak sampled VRAM | Images | | |
| | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | |
| | BF16 base pipeline | 31.31s | 125,134 MiB | 124,386 MiB | 16.05s | 15.51s / 17.97s | 141,104 MiB | 10 | | |
| | SDNQ INT8 pipeline | 26.79s | 67,268 MiB | 66,528 MiB | 25.51s | 21.57s / 36.53s | 83,202 MiB | 10 | | |
| ### Original NVIDIA Example Caption | |
| The original model repository provides [`assets/example_caption.json`](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image/blob/main/assets/example_caption.json). The images below are generated locally with the same JSON-caption, seed 1143, 1024x1024, 50 steps, guidance scale 4.0. | |
| | Variant | Pipeline load | Generation time | Peak sampled VRAM | | |
| | --- | ---: | ---: | ---: | | |
| | BF16 base pipeline | 35.41s | 18.01s | 141,098 MiB | | |
| | SDNQ INT8 pipeline | 25.79s | 66.05s | 83,218 MiB | | |
| BF16 reference output: | |
|  | |
| SDNQ INT8 output: | |
|  | |
| ## Stress Prompt Examples | |
| The following ten images use the same handwritten stress prompt set and seeds as the benchmark table. | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| ## Notes | |
| This repository is an independent transformer-only quantization artifact. NVIDIA's original card states that Cosmos3-Super-Text2Image was tested in BF16; this SDNQ artifact should be treated as an experimental deployment variant and evaluated for each workload. | |