Cosmos3-Super-Text2Image — NF4 4-bit Pre-Quantized Transformer

Pre-quantized NF4 (4-bit, double-quantized) version of NVIDIA's nvidia/Cosmos3-Super-Text2Image — the 64B Cosmos 3 checkpoint specialized for high-fidelity text-to-image — created with bitsandbytes. Only the large Cosmos3OmniTransformer is quantized; the VAE and tokenizers are bundled unchanged at bf16, so the repo is self-contained and drop-in.

This makes the 64B model practical on a single GPU and loads in ~1–2 minutes with no runtime quantization pass (on-the-fly NF4 of the bf16 original takes ~13 minutes every load).

Key Details

Property	Value
Repo size	35 GB (vs ~130 GB bf16)
Quantized component	`transformer` — NF4 (vs ~128 GB bf16)
Quantization	NF4 (bitsandbytes), double quantization, `bnb_4bit_compute_dtype=bfloat16`
Mode	text-to-image
Base params	64B
VRAM (loaded)	~37 GB
Source weights	nvidia/Cosmos3-Super-Text2Image (bf16)
Tested on	NVIDIA GB10 (DGX Spark)

Usage

Requires a diffusers build with Cosmos 3 support (currently from source) plus bitsandbytes. The NF4 config is embedded — do not pass a quantization_config, and do not call .to(dtype) on a 4-bit model.

pip install "git+https://github.com/huggingface/diffusers.git" bitsandbytes accelerate

import torch
from diffusers import Cosmos3OmniPipeline

pipe = Cosmos3OmniPipeline.from_pretrained(
    "SanDiegoDude/Cosmos3-Super-Text2Image-nf4",
    torch_dtype=torch.bfloat16,
    enable_safety_checker=False,  # skips the optional cosmos_guardrail dependency
).to("cuda")

result = pipe("A weathered lighthouse on a cliff at golden hour, photoreal, 50mm.")
frames = result.video[0]          # text-to-image returns a single frame
frames[0].save("out.png")

ComfyUI

A turnkey loader + T2I node is available in scg-Cosmos3. The loader auto-detects this pre-quantized layout and skips the re-quant pass.

Related Repos

Original model (bf16, source): nvidia/Cosmos3-Super-Text2Image
16B omnimodal variant (NF4): SanDiegoDude/Cosmos3-Nano-nf4

License

Released under NVIDIA's OpenMDW 1.1 License, inherited from the base model. Quantization only changes the weight encoding.

Downloads last month: 80

Safetensors

Model size

33B params

Tensor type

F32

BF16

Model tree for SanDiegoDude/Cosmos3-Super-Text2Image-nf4

Base model

nvidia/Cosmos3-Super-Text2Image

Quantized

(1)

this model