Cosmos3-Super-Text2Image SDNQ INT8 Transformer

This repository contains a transformer-only SDNQ quantization for nvidia/Cosmos3-Super-Text2Image.

It does not repeat the original model card. Read NVIDIA's model card, prompt-format guidance, license, and safety notes here: nvidia/Cosmos3-Super-Text2Image.

Only transformer/ is provided as a weight artifact. The VAE, scheduler, tokenizers, safety checker, and other components are loaded from the base model.

The quantization format comes from Disty0/sdnq. SD.Next's quantization overview is here: vladmandic/sdnext Quantization.

Recipe

Setting	Value
Weights dtype	`int8`
Static quantization	`True`
Dynamic quantization	`False`
SVD	`False`
SVD rank / steps	`32` / `8`
Quantized matmul	`True`
Dequantize FP32	`True`
Quantized conv / embedding	`False` / `False`

Quantization run: 20.90s; save time: 85.31s; transformer safetensors: 61.17 GiB.

Assemble The Pipeline

import json
import torch
from diffusers import Cosmos3OmniPipeline, Cosmos3OmniTransformer
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from huggingface_hub import snapshot_download
from sdnq.loader import load_sdnq_model

snapshot_path = snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer")
transformer = load_sdnq_model(
    f"{snapshot_path}/transformer",
    model_cls=Cosmos3OmniTransformer,
    dtype=torch.bfloat16,
    device=torch.device("cuda"),
    dequantize_fp32=True,
    use_quantized_matmul=True,
)

pipe = Cosmos3OmniPipeline.from_pretrained(
    "nvidia/Cosmos3-Super-Text2Image",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    enable_safety_checker=True,
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=3.0)
pipe.to("cuda")

json_caption = {
    "subjects": [],
    "background_setting": "A concise scene description.",
    "comprehensive_t2i_caption": "A detailed natural-language caption.",
    "resolution": {"H": 1024, "W": 1024},
    "aspect_ratio": "1,1",
}

result = pipe(
    prompt=json.dumps(json_caption),
    negative_prompt="",
    num_frames=1,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(1143),
)
result.video[0].save("cosmos3_sdnq_int8.png")

load_sdnq_model expects a local path. Download this repository first, or use huggingface_hub.snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer") and pass snapshot_path + "/transformer".

Benchmarks

Measured on one RunPod NVIDIA B200 instance with local container storage, cached model files, PyTorch 2.9.1+cu130, 1024x1024 image generation, 50 inference steps, guidance scale 4.0, flow_shift=3.0, system prompt enabled.

Transformer Component Load

Variant	Load to CUDA	VRAM after load	Torch allocated	Torch reserved	Transformer safetensors
BF16 base transformer	22.87s	122,760 MiB	122,121 MiB	122,132 MiB	119.21 GiB
SDNQ INT8 transformer	16.50s	63,920 MiB	63,018 MiB	63,200 MiB	61.17 GiB

Full Pipeline Generation

The stress set is ten handwritten JSON-caption prompts designed to stress Cyrillic text, reflections, multi-object composition, anatomy, small details, and scene-following.

Variant	Full pipeline load	VRAM after load	Torch allocated after load	Avg generation time	Min / max generation time	Peak sampled VRAM	Images
BF16 base pipeline	31.31s	125,134 MiB	124,386 MiB	16.05s	15.51s / 17.97s	141,104 MiB	10
SDNQ INT8 pipeline	26.79s	67,268 MiB	66,528 MiB	25.51s	21.57s / 36.53s	83,202 MiB	10

Original NVIDIA Example Caption

The original model repository provides assets/example_caption.json. The images below are generated locally with the same JSON-caption, seed 1143, 1024x1024, 50 steps, guidance scale 4.0.

Variant	Pipeline load	Generation time	Peak sampled VRAM
BF16 base pipeline	35.41s	18.01s	141,098 MiB
SDNQ INT8 pipeline	25.79s	66.05s	83,218 MiB

BF16 reference output:

SDNQ INT8 output:

Stress Prompt Examples

The following ten images use the same handwritten stress prompt set and seeds as the benchmark table.

Notes

This repository is an independent transformer-only quantization artifact. NVIDIA's original card states that Cosmos3-Super-Text2Image was tested in BF16; this SDNQ artifact should be treated as an experimental deployment variant and evaluated for each workload.

Downloads last month: -

Model tree for WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer

Base model

nvidia/Cosmos3-Super-Text2Image

Finetuned

(5)

this model

Collection including WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer

Cosmos 3 Super Quants

Collection

Transformer-only quantization artifacts for nvidia/Cosmos3-Super-Text2Image with generation examples and B200 benchmark notes. • 5 items • Updated about 11 hours ago