How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("SanDiegoDude/Cosmos3-Super-Text2Image-nf4", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Cosmos3-Super-Text2Image β€” NF4 4-bit Pre-Quantized Transformer

Pre-quantized NF4 (4-bit, double-quantized) version of NVIDIA's nvidia/Cosmos3-Super-Text2Image β€” the 64B Cosmos 3 checkpoint specialized for high-fidelity text-to-image β€” created with bitsandbytes. Only the large Cosmos3OmniTransformer is quantized; the VAE and tokenizers are bundled unchanged at bf16, so the repo is self-contained and drop-in.

This makes the 64B model practical on a single GPU and loads in ~1–2 minutes with no runtime quantization pass (on-the-fly NF4 of the bf16 original takes ~13 minutes every load).

Key Details

Property Value
Repo size 35 GB (vs ~130 GB bf16)
Quantized component transformer β€” NF4 (vs ~128 GB bf16)
Quantization NF4 (bitsandbytes), double quantization, bnb_4bit_compute_dtype=bfloat16
Mode text-to-image
Base params 64B
VRAM (loaded) ~37 GB
Source weights nvidia/Cosmos3-Super-Text2Image (bf16)
Tested on NVIDIA GB10 (DGX Spark)

Usage

Requires a diffusers build with Cosmos 3 support (currently from source) plus bitsandbytes. The NF4 config is embedded β€” do not pass a quantization_config, and do not call .to(dtype) on a 4-bit model.

pip install "git+https://github.com/huggingface/diffusers.git" bitsandbytes accelerate
import torch
from diffusers import Cosmos3OmniPipeline

pipe = Cosmos3OmniPipeline.from_pretrained(
    "SanDiegoDude/Cosmos3-Super-Text2Image-nf4",
    torch_dtype=torch.bfloat16,
    enable_safety_checker=False,  # skips the optional cosmos_guardrail dependency
).to("cuda")

result = pipe("A weathered lighthouse on a cliff at golden hour, photoreal, 50mm.")
frames = result.video[0]          # text-to-image returns a single frame
frames[0].save("out.png")

ComfyUI

A turnkey loader + T2I node is available in scg-Cosmos3. The loader auto-detects this pre-quantized layout and skips the re-quant pass.

Related Repos

License

Released under NVIDIA's OpenMDW 1.1 License, inherited from the base model. Quantization only changes the weight encoding.

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SanDiegoDude/Cosmos3-Super-Text2Image-nf4

Quantized
(1)
this model