Instructions to use Reza2kn/Cosmos3-Nano-NVFP4-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Reza2kn/Cosmos3-Nano-NVFP4-AWQ with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Reza2kn/Cosmos3-Nano-NVFP4-AWQ", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Cosmos
How to use Reza2kn/Cosmos3-Nano-NVFP4-AWQ with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Cosmos3-Nano โ NVFP4-AWQ (4-bit, Blackwell-native)
A NVFP4 (4-bit) weight-only, activation-aware quantization of
nvidia/Cosmos3-Nano, produced with NVIDIA TensorRT
Model Optimizer. NVFP4 is the Blackwell-native 4-bit format (E2M1 with FP8 block scales). The
transformer's attention + FFN linears (~11.8 B, 77.6%) are NVFP4; embeddings, norms, the diffusion
time-embedder, and modality adapters stay BF16. Activations stay BF16 (weight-only).
Derivative of
nvidia/Cosmos3-Nano. ยฉ NVIDIA. Distributed under OpenMDW-1.1 (license + NVIDIA copyright/origin notices retained, per the license). Not affiliated with, nor endorsed by, NVIDIA.
Precision options (pick by hardware)
| Build | ~Total size | Fits 16 GB GPU? | Quality |
|---|---|---|---|
| NVFP4-AWQ / INT4-AWQ (this tier) | ~13 GB | โ (tight) | near-zero loss; hardest hands/text can wobble |
| FP8 | ~18 GB | โ (~24 GB) | near-indistinguishable from BF16 |
| BF16 (original) | ~33 GB | โ | reference |
Quality vs BF16 (96-prompt anatomy-weighted sweep)
| Metric | BF16 | NVFP4-AWQ |
|---|---|---|
| PickScore (human pref) | 21.85 | 21.82 (ฮ โ0.03) |
| FID vs BF16 | โ | 80.6 (best distribution match of all 4-bit recipes) |
| Functional fidelity (velocity cosine) | 1.000 | ~0.998 |
FID context: BF16-vs-BF16 at a different seed (same prompts, N=96) = 138.6. NVFP4-AWQ's 80.6 is
well below that seed-noise floor โ it tracks BF16 more closely than BF16 tracks itself across seeds.
Caveat: dense interlocking hands / on-image text can still wobble (base-model-hard, present in BF16
too). See worst_case_contact_sheet.png.
Usage
import torch
from huggingface_hub import snapshot_download
from diffusers import Cosmos3OmniPipeline, Cosmos3OmniTransformer
import modelopt.torch.opt as mto
repo = snapshot_download("Reza2kn/Cosmos3-Nano-NVFP4-AWQ")
tf = Cosmos3OmniTransformer.from_config(
Cosmos3OmniTransformer.load_config(f"{repo}/transformer/config.json")).to(torch.bfloat16)
mto.restore(tf, f"{repo}/transformer/modelopt_quantized.pt")
pipe = Cosmos3OmniPipeline.from_pretrained(
repo, transformer=tf, torch_dtype=torch.bfloat16, enable_safety_checker=False).to("cuda")
with torch.autocast("cuda", dtype=torch.bfloat16): # required (float32 rotary -> bf16 linears)
img = pipe("A red panda astronaut floating in a nebula", num_frames=1, height=480, width=480).video[0][0]
Or from load_quantized import load; pipe = load(). Requires diffusers (git main/โฅ0.39),
nvidia-modelopt, torch cu128. Best on Blackwell (sm_120) for native NVFP4; runs elsewhere via modelopt dequant.
Method
modelopt NVFP4_AWQ_LITE_CFG (awq_lite), weight-only; calibrated on multimodal image+video prompts through
the real denoising loop. Quantized self_attn.*/mlp.*/mlp_moe_gen.*/lm_head; BF16 for embeddings,
norms, time_embedder, proj_in/out, audio/action adapters.
- Downloads last month
- -
Model tree for Reza2kn/Cosmos3-Nano-NVFP4-AWQ
Base model
nvidia/Cosmos3-Nano