File size: 6,373 Bytes

132e405

---
base_model: nvidia/Cosmos3-Super-Text2Image
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - cosmos3
  - diffusers
  - sdnq
  - text-to-image
  - int8
license: other
license_name: openmdw1.1-license
license_link: https://openmdw.ai/license/1-1/
---

# Cosmos3-Super-Text2Image SDNQ INT8 Transformer

This repository contains a transformer-only SDNQ quantization for [nvidia/Cosmos3-Super-Text2Image](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image).

It does not repeat the original model card. Read NVIDIA's model card, prompt-format guidance, license, and safety notes here:
[nvidia/Cosmos3-Super-Text2Image](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image).

Only `transformer/` is provided as a weight artifact. The VAE, scheduler, tokenizers, safety checker, and other components are loaded from the base model.

The quantization format comes from [Disty0/sdnq](https://github.com/Disty0/sdnq). SD.Next's quantization overview is here:
[vladmandic/sdnext Quantization](https://github.com/vladmandic/sdnext/wiki/Quantization).

## Recipe

| Setting | Value |
| --- | --- |
| Weights dtype | `int8` |
| Static quantization | `True` |
| Dynamic quantization | `False` |
| SVD | `False` |
| SVD rank / steps | `32` / `8` |
| Quantized matmul | `True` |
| Dequantize FP32 | `True` |
| Quantized conv / embedding | `False` / `False` |

Quantization run: 20.90s; save time: 85.31s; transformer safetensors: 61.17 GiB.

## Assemble The Pipeline

```python
import json
import torch
from diffusers import Cosmos3OmniPipeline, Cosmos3OmniTransformer
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from huggingface_hub import snapshot_download
from sdnq.loader import load_sdnq_model

snapshot_path = snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer")
transformer = load_sdnq_model(
    f"{snapshot_path}/transformer",
    model_cls=Cosmos3OmniTransformer,
    dtype=torch.bfloat16,
    device=torch.device("cuda"),
    dequantize_fp32=True,
    use_quantized_matmul=True,
)

pipe = Cosmos3OmniPipeline.from_pretrained(
    "nvidia/Cosmos3-Super-Text2Image",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    enable_safety_checker=True,
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=3.0)
pipe.to("cuda")

json_caption = {
    "subjects": [],
    "background_setting": "A concise scene description.",
    "comprehensive_t2i_caption": "A detailed natural-language caption.",
    "resolution": {"H": 1024, "W": 1024},
    "aspect_ratio": "1,1",
}

result = pipe(
    prompt=json.dumps(json_caption),
    negative_prompt="",
    num_frames=1,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(1143),
)
result.video[0].save("cosmos3_sdnq_int8.png")
```

`load_sdnq_model` expects a local path. Download this repository first, or use `huggingface_hub.snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer")` and pass `snapshot_path + "/transformer"`.

## Benchmarks

Measured on one RunPod NVIDIA B200 instance with local container storage, cached model files, PyTorch `2.9.1+cu130`, 1024x1024 image generation, 50 inference steps, guidance scale 4.0, `flow_shift=3.0`, system prompt enabled.

### Transformer Component Load

| Variant | Load to CUDA | VRAM after load | Torch allocated | Torch reserved | Transformer safetensors |
| --- | ---: | ---: | ---: | ---: | ---: |
| BF16 base transformer | 22.87s | 122,760 MiB | 122,121 MiB | 122,132 MiB | 119.21 GiB |
| SDNQ INT8 transformer | 16.50s | 63,920 MiB | 63,018 MiB | 63,200 MiB | 61.17 GiB |

### Full Pipeline Generation

The stress set is ten handwritten JSON-caption prompts designed to stress Cyrillic text, reflections, multi-object composition, anatomy, small details, and scene-following.

| Variant | Full pipeline load | VRAM after load | Torch allocated after load | Avg generation time | Min / max generation time | Peak sampled VRAM | Images |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| BF16 base pipeline | 31.31s | 125,134 MiB | 124,386 MiB | 16.05s | 15.51s / 17.97s | 141,104 MiB | 10 |
| SDNQ INT8 pipeline | 26.79s | 67,268 MiB | 66,528 MiB | 25.51s | 21.57s / 36.53s | 83,202 MiB | 10 |

### Original NVIDIA Example Caption

The original model repository provides [`assets/example_caption.json`](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image/blob/main/assets/example_caption.json). The images below are generated locally with the same JSON-caption, seed 1143, 1024x1024, 50 steps, guidance scale 4.0.

| Variant | Pipeline load | Generation time | Peak sampled VRAM |
| --- | ---: | ---: | ---: |
| BF16 base pipeline | 35.41s | 18.01s | 141,098 MiB |
| SDNQ INT8 pipeline | 25.79s | 66.05s | 83,218 MiB |

BF16 reference output:

![BF16 output for NVIDIA example caption](examples/nvidia_example_caption_bf16.png)

SDNQ INT8 output:

![SDNQ INT8 output for NVIDIA example caption](examples/nvidia_example_caption_sdnq_int8.png)

## Stress Prompt Examples

The following ten images use the same handwritten stress prompt set and seeds as the benchmark table.

![01 metro archive reading room](examples/01_metro_archive_reading_room_sdnq_int8.png)
![02 arctic greenhouse night shift](examples/02_arctic_greenhouse_night_shift_sdnq_int8.png)
![03 control room restoration](examples/03_control_room_restoration_sdnq_int8.png)
![04 rain market cross section](examples/04_rain_market_cross_section_sdnq_int8.png)
![05 manuscript restoration table](examples/05_manuscript_restoration_table_sdnq_int8.png)
![06 robotic assembly line signage](examples/06_robotic_assembly_line_signage_sdnq_int8.png)
![07 kitchen storm chess table](examples/07_kitchen_storm_chess_table_sdnq_int8.png)
![08 orbital cockpit cyrillic ui](examples/08_orbital_cockpit_cyrillic_ui_sdnq_int8.png)
![09 flood command center](examples/09_flood_command_center_sdnq_int8.png)
![10 cyrillic newspaper press](examples/10_cyrillic_newspaper_press_sdnq_int8.png)

## Notes

This repository is an independent transformer-only quantization artifact. NVIDIA's original card states that Cosmos3-Super-Text2Image was tested in BF16; this SDNQ artifact should be treated as an experimental deployment variant and evaluated for each workload.