Add files using upload-large-folder tool

132e405 verified 1 day ago

6.37 kB

	---
	base_model: nvidia/Cosmos3-Super-Text2Image
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- cosmos3
	- diffusers
	- sdnq
	- text-to-image
	- int8
	license: other
	license_name: openmdw1.1-license
	license_link: https://openmdw.ai/license/1-1/
	---

	# Cosmos3-Super-Text2Image SDNQ INT8 Transformer

	This repository contains a transformer-only SDNQ quantization for [nvidia/Cosmos3-Super-Text2Image](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image).

	It does not repeat the original model card. Read NVIDIA's model card, prompt-format guidance, license, and safety notes here:
	[nvidia/Cosmos3-Super-Text2Image](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image).

	Only `transformer/` is provided as a weight artifact. The VAE, scheduler, tokenizers, safety checker, and other components are loaded from the base model.

	The quantization format comes from [Disty0/sdnq](https://github.com/Disty0/sdnq). SD.Next's quantization overview is here:
	[vladmandic/sdnext Quantization](https://github.com/vladmandic/sdnext/wiki/Quantization).

	## Recipe

	\| Setting \| Value \|
	\| --- \| --- \|
	\| Weights dtype \| `int8` \|
	\| Static quantization \| `True` \|
	\| Dynamic quantization \| `False` \|
	\| SVD \| `False` \|
	\| SVD rank / steps \| `32` / `8` \|
	\| Quantized matmul \| `True` \|
	\| Dequantize FP32 \| `True` \|
	\| Quantized conv / embedding \| `False` / `False` \|

	Quantization run: 20.90s; save time: 85.31s; transformer safetensors: 61.17 GiB.

	## Assemble The Pipeline

	```python
	import json
	import torch
	from diffusers import Cosmos3OmniPipeline, Cosmos3OmniTransformer
	from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
	from huggingface_hub import snapshot_download
	from sdnq.loader import load_sdnq_model

	snapshot_path = snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer")
	transformer = load_sdnq_model(
	f"{snapshot_path}/transformer",
	model_cls=Cosmos3OmniTransformer,
	dtype=torch.bfloat16,
	device=torch.device("cuda"),
	dequantize_fp32=True,
	use_quantized_matmul=True,
	)

	pipe = Cosmos3OmniPipeline.from_pretrained(
	"nvidia/Cosmos3-Super-Text2Image",
	transformer=transformer,
	torch_dtype=torch.bfloat16,
	device_map="cuda",
	enable_safety_checker=True,
	)
	pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=3.0)
	pipe.to("cuda")

	json_caption = {
	"subjects": [],
	"background_setting": "A concise scene description.",
	"comprehensive_t2i_caption": "A detailed natural-language caption.",
	"resolution": {"H": 1024, "W": 1024},
	"aspect_ratio": "1,1",
	}

	result = pipe(
	prompt=json.dumps(json_caption),
	negative_prompt="",
	num_frames=1,
	height=1024,
	width=1024,
	num_inference_steps=50,
	guidance_scale=4.0,
	generator=torch.Generator(device="cuda").manual_seed(1143),
	)
	result.video[0].save("cosmos3_sdnq_int8.png")
	```

	`load_sdnq_model` expects a local path. Download this repository first, or use `huggingface_hub.snapshot_download("WaveCut/Cosmos3-Super-Text2Image-SDNQ-Int8-Transformer")` and pass `snapshot_path + "/transformer"`.

	## Benchmarks

	Measured on one RunPod NVIDIA B200 instance with local container storage, cached model files, PyTorch `2.9.1+cu130`, 1024x1024 image generation, 50 inference steps, guidance scale 4.0, `flow_shift=3.0`, system prompt enabled.

	### Transformer Component Load

	\| Variant \| Load to CUDA \| VRAM after load \| Torch allocated \| Torch reserved \| Transformer safetensors \|
	\| --- \| ---: \| ---: \| ---: \| ---: \| ---: \|
	\| BF16 base transformer \| 22.87s \| 122,760 MiB \| 122,121 MiB \| 122,132 MiB \| 119.21 GiB \|
	\| SDNQ INT8 transformer \| 16.50s \| 63,920 MiB \| 63,018 MiB \| 63,200 MiB \| 61.17 GiB \|

	### Full Pipeline Generation

	The stress set is ten handwritten JSON-caption prompts designed to stress Cyrillic text, reflections, multi-object composition, anatomy, small details, and scene-following.

	\| Variant \| Full pipeline load \| VRAM after load \| Torch allocated after load \| Avg generation time \| Min / max generation time \| Peak sampled VRAM \| Images \|
	\| --- \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \|
	\| BF16 base pipeline \| 31.31s \| 125,134 MiB \| 124,386 MiB \| 16.05s \| 15.51s / 17.97s \| 141,104 MiB \| 10 \|
	\| SDNQ INT8 pipeline \| 26.79s \| 67,268 MiB \| 66,528 MiB \| 25.51s \| 21.57s / 36.53s \| 83,202 MiB \| 10 \|

	### Original NVIDIA Example Caption

	The original model repository provides [`assets/example_caption.json`](https://huggingface.co/nvidia/Cosmos3-Super-Text2Image/blob/main/assets/example_caption.json). The images below are generated locally with the same JSON-caption, seed 1143, 1024x1024, 50 steps, guidance scale 4.0.

	\| Variant \| Pipeline load \| Generation time \| Peak sampled VRAM \|
	\| --- \| ---: \| ---: \| ---: \|
	\| BF16 base pipeline \| 35.41s \| 18.01s \| 141,098 MiB \|
	\| SDNQ INT8 pipeline \| 25.79s \| 66.05s \| 83,218 MiB \|

	BF16 reference output:

	![BF16 output for NVIDIA example caption](examples/nvidia_example_caption_bf16.png)

	SDNQ INT8 output:

	![SDNQ INT8 output for NVIDIA example caption](examples/nvidia_example_caption_sdnq_int8.png)

	## Stress Prompt Examples

	The following ten images use the same handwritten stress prompt set and seeds as the benchmark table.

	![01 metro archive reading room](examples/01_metro_archive_reading_room_sdnq_int8.png)
	![02 arctic greenhouse night shift](examples/02_arctic_greenhouse_night_shift_sdnq_int8.png)
	![03 control room restoration](examples/03_control_room_restoration_sdnq_int8.png)
	![04 rain market cross section](examples/04_rain_market_cross_section_sdnq_int8.png)
	![05 manuscript restoration table](examples/05_manuscript_restoration_table_sdnq_int8.png)
	![06 robotic assembly line signage](examples/06_robotic_assembly_line_signage_sdnq_int8.png)
	![07 kitchen storm chess table](examples/07_kitchen_storm_chess_table_sdnq_int8.png)
	![08 orbital cockpit cyrillic ui](examples/08_orbital_cockpit_cyrillic_ui_sdnq_int8.png)
	![09 flood command center](examples/09_flood_command_center_sdnq_int8.png)
	![10 cyrillic newspaper press](examples/10_cyrillic_newspaper_press_sdnq_int8.png)

	## Notes

	This repository is an independent transformer-only quantization artifact. NVIDIA's original card states that Cosmos3-Super-Text2Image was tested in BF16; this SDNQ artifact should be treated as an experimental deployment variant and evaluated for each workload.