Z-Anime NVFP4 β€” Blackwell Optimized

Full NVFP4 quantized versions of SeeSee21/Z-Anime (S3-DiT architecture) for NVIDIA Blackwell hardware via ComfyUI + comfy_kitchen.

Sample Output

Z-Anime NVFP4 Sample

1024Γ—1024, 50 steps, CFG 4.0, euler_ancestral, seed 7777

Benchmark (GB10)

Metric FP8 NVFP4 Speedup
Model size 5.8 GB 3.5 GB 1.7Γ— smaller
Inference (50 steps) 96.5s 56.5s 1.71Γ—
Inference (4 steps distill) ~8s ~5s ~1.6Γ—
Original BF16 11.5 GB β€” 3.3Γ— smaller

All benchmarks: 1024Γ—1024, seed 7777, NVIDIA GB10.

Hardware Requirements

  • NVIDIA Blackwell GPU (SM β‰₯ 10.0): GB10, GB200, RTX 5090
  • ComfyUI with comfy_kitchen support
  • On non-Blackwell hardware, falls back to dequantization (functional but slow)

Variants

  • z-anime-distill-4step-nvfp4.safetensors β€” Distilled 4-step model (recommended for speed)
  • z-anime-base-nvfp4.safetensors β€” Base model, 28–50 steps with CFG

Setup

  1. Place safetensors files in ComfyUI/models/diffusion_models/
  2. Download CLIP (Qwen 3 4B) and VAE from SeeSee21/Z-Anime

ComfyUI Workflow

  • UNETLoader: variant safetensors, weight_dtype: default
  • CLIPLoader: qwen_3_4b.safetensors, type: qwen_image
  • VAELoader: Z-Anime VAE
  • ModelSamplingAuraFlow: shift=3.5
  • KSampler: euler_ancestral / beta
    • Distill: 4 steps, CFG 1.0
    • Base: 28–50 steps, CFG 3.0–5.0
  • Resolution: 832Γ—1216 (portrait), 1216Γ—832 (landscape), 1024Γ—1024

Recommended Settings

Distill 4-step

Parameter Value
Steps 4
CFG 1.0
Sampler euler_ancestral
Scheduler beta
Shift 3.5

Base

Parameter Value
Steps 28–50
CFG 3.0–5.0
Sampler euler_ancestral
Scheduler beta
Shift 3.5

Quantization Details

  • Method: comfy_kitchen.quantize_nvfp4() with per-tensor scaling + 16Γ— block padding
  • Format: uint8 packed FP4 (E2M1) + float8_e4m3fn block scales + float32 per-tensor scale
  • Source: BF16 β†’ NVFP4 (direct, no FP8 dequant intermediary)
  • Distill 4-step: 171 layers quantized
  • Base: 239 layers quantized

Why NVFP4?

The S3-DiT architecture (Z-Anime/Z-Image) benefits significantly from NVFP4 because it lacks the FP8-optimized kernels that FLUX/SDXL have. NVFP4 provides a genuine 1.71Γ— speedup over FP8 on Blackwell hardware.

Base Model

SeeSee21/Z-Anime β€” S3-DiT architecture, fine-tuned for anime-style generation. Apache 2.0 license.


Quantized by r0b0tlab on NVIDIA GB10.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for r0b0tlab/Z-Anime-NVFP4

Finetuned
SeeSee21/Z-Anime
Quantized
(3)
this model