Z-Anime NVFP4 — Blackwell Optimized

Full NVFP4 quantized versions of SeeSee21/Z-Anime (S3-DiT architecture) for NVIDIA Blackwell hardware via ComfyUI + comfy_kitchen.

Sample Output

1024×1024, 50 steps, CFG 4.0, euler_ancestral, seed 7777

Benchmark (GB10)

Metric	FP8	NVFP4	Speedup
Model size	5.8 GB	3.5 GB	1.7× smaller
Inference (50 steps)	96.5s	56.5s	1.71×
Inference (4 steps distill)	~8s	~5s	~1.6×
Original BF16	11.5 GB	—	3.3× smaller

All benchmarks: 1024×1024, seed 7777, NVIDIA GB10.

Hardware Requirements

NVIDIA Blackwell GPU (SM ≥ 10.0): GB10, GB200, RTX 5090
ComfyUI with comfy_kitchen support
On non-Blackwell hardware, falls back to dequantization (functional but slow)

Variants

z-anime-distill-4step-nvfp4.safetensors — Distilled 4-step model (recommended for speed)
z-anime-base-nvfp4.safetensors — Base model, 28–50 steps with CFG

Setup

Place safetensors files in ComfyUI/models/diffusion_models/
Download CLIP (Qwen 3 4B) and VAE from SeeSee21/Z-Anime

ComfyUI Workflow

UNETLoader: variant safetensors, weight_dtype: default
CLIPLoader: qwen_3_4b.safetensors, type: qwen_image
VAELoader: Z-Anime VAE
ModelSamplingAuraFlow: shift=3.5
KSampler: euler_ancestral / beta
- Distill: 4 steps, CFG 1.0
- Base: 28–50 steps, CFG 3.0–5.0
Resolution: 832×1216 (portrait), 1216×832 (landscape), 1024×1024

Recommended Settings

Distill 4-step

Parameter	Value
Steps	4
CFG	1.0
Sampler	euler_ancestral
Scheduler	beta
Shift	3.5

Base

Parameter	Value
Steps	28–50
CFG	3.0–5.0
Sampler	euler_ancestral
Scheduler	beta
Shift	3.5

Quantization Details

Method: comfy_kitchen.quantize_nvfp4() with per-tensor scaling + 16× block padding
Format: uint8 packed FP4 (E2M1) + float8_e4m3fn block scales + float32 per-tensor scale
Source: BF16 → NVFP4 (direct, no FP8 dequant intermediary)
Distill 4-step: 171 layers quantized
Base: 239 layers quantized

Why NVFP4?

The S3-DiT architecture (Z-Anime/Z-Image) benefits significantly from NVFP4 because it lacks the FP8-optimized kernels that FLUX/SDXL have. NVFP4 provides a genuine 1.71× speedup over FP8 on Blackwell hardware.

Base Model

SeeSee21/Z-Anime — S3-DiT architecture, fine-tuned for anime-style generation. Apache 2.0 license.

Quantized by r0b0tlab on NVIDIA GB10.

Downloads last month: -

Model tree for r0b0tlab/Z-Anime-NVFP4

Base model

Tongyi-MAI/Z-Image

Finetuned

SeeSee21/Z-Anime

Quantized

(3)

this model