Instructions to use wfen/Cosmos3-Nano-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use wfen/Cosmos3-Nano-FP8 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("wfen/Cosmos3-Nano-FP8", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Cosmos3-Nano โ FP8 (safetensors)
Weight-only FP8 (E4M3) quantization of the Cosmos3OmniTransformer for Cosmos3-Nano,
delivered as safetensors. Produced by Session 3 (safetensors export + diffusers load path).
The transformer drops from ~30 GB (bf16) to ~15 GB; VAE, vision encoder, tokenizers, and scheduler
remain bf16. Runs the diffusers Cosmos3OmniPipeline on a single RTX 5090 (32 GB).
Load
from load_quantized import load # self-contained; needs torch, diffusers, modelopt, safetensors
pipe = load(".") # this directory
import torch
with torch.autocast("cuda", torch.bfloat16):
img = pipe("a corgi astronaut", num_frames=1, height=480, width=480).video[0][0]
img.save("out.png")
Format (Path B)
The transformer is serialized as safetensors plus a tiny structural sidecar:
| File | Contents |
|---|---|
transformer/diffusion_pytorch_model.safetensors |
505 weight-only E4M3 weights + per-tensor weight_quantizer._amax / ._scale buffers + bf16 keep-modules (1819 tensors) |
transformer/modelopt_state.pt |
724 KB tensor-free ModelOpt structural state (quantizer layout) โ needed to rebuild the quantizer modules |
transformer/config.json |
transformer config (action_gen=false) |
quantization_config.json |
recipe, exclusions, and the scale_layout (key suffixes, counts, granularity) |
transformer/modelopt_quantized.pt |
retained fallback โ the ModelOpt .pt, loadable via modelopt.torch.opt.restore |
Load = from_config (action_gen=False) โ modelopt.torch.opt.restore_from_modelopt_state โ
load_state_dict(strict=True). The loader reads only the safetensors + sidecar โ never the .pt.
Security:
modelopt_state.pt(and the retainedmodelopt_quantized.pt) are loaded withtorch.load(weights_only=False), which executes pickle. Load this checkpoint only from a source you trust โ a tampered sidecar is remote code execution at load time. The*.safetensorsweights are safe; only the small structural sidecar uses pickle.
Why a sidecar instead of pure
export_hf_checkpoint? ModelOpt's unified HF export (diffusers dispatch) does not recognizeCosmos3OmniTransformerand drops the per-tensor FP8 scales, so its safetensors cannot be dequantized. Path B (above) preserves them. Seedocs/reports/session_3.md.
Recipe & scope (INV-2 / INV-3)
Weight-only FP8 E4M3 (activation quantizers disabled). Quantized: self_attn.*, mlp.*,
mlp_moe_gen.*, lm_head (505 Linears). Kept bf16: token embeddings, all norms, time_embedder,
proj_in/proj_out, audio/action adapters.
Equivalence
Reproduces the ModelOpt .pt (and thus the NVIDIA-style reference FP8 recipe) bitwise:
weight round-trip max-abs-diff 0.0 (1812 tensors); pipeline latent error M1 = 0.0 and LPIPS = 0.0
on EC-01..04 at 8 steps and EC-01 at 35 steps (seed 123, UniPC flow_shift=10.0, 1f/480ยฒ).
Limitations
action_gen=Falsebuild (matches the reference quantized checkpoint, whose.ptis action-adapter-stripped). No action-conditioned generation from this checkpoint.- Verification at the smoke setting (1 frame / 480ร480); full-res 720p/189-frame is out of scope.
- FP8 compute is ModelOpt fake-quant (compute in bf16); real-FP4/FP8 kernel speedups are out of scope.
- Downloads last month
- 32
Model tree for wfen/Cosmos3-Nano-FP8
Base model
nvidia/Cosmos3-Nano