Text-to-Image
Diffusers
English
image-generation
diffusion
anime
z-image
z-anime
comfyui
nvfp4
blackwell
gb10
Instructions to use r0b0tlab/Z-Anime-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use r0b0tlab/Z-Anime-NVFP4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("r0b0tlab/Z-Anime-NVFP4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
Z-Anime NVFP4 β Blackwell Optimized
Full NVFP4 quantized versions of SeeSee21/Z-Anime (S3-DiT architecture) for NVIDIA Blackwell hardware via ComfyUI + comfy_kitchen.
Sample Output
1024Γ1024, 50 steps, CFG 4.0, euler_ancestral, seed 7777
Benchmark (GB10)
| Metric | FP8 | NVFP4 | Speedup |
|---|---|---|---|
| Model size | 5.8 GB | 3.5 GB | 1.7Γ smaller |
| Inference (50 steps) | 96.5s | 56.5s | 1.71Γ |
| Inference (4 steps distill) | ~8s | ~5s | ~1.6Γ |
| Original BF16 | 11.5 GB | β | 3.3Γ smaller |
All benchmarks: 1024Γ1024, seed 7777, NVIDIA GB10.
Hardware Requirements
- NVIDIA Blackwell GPU (SM β₯ 10.0): GB10, GB200, RTX 5090
- ComfyUI with comfy_kitchen support
- On non-Blackwell hardware, falls back to dequantization (functional but slow)
Variants
z-anime-distill-4step-nvfp4.safetensorsβ Distilled 4-step model (recommended for speed)z-anime-base-nvfp4.safetensorsβ Base model, 28β50 steps with CFG
Setup
- Place safetensors files in
ComfyUI/models/diffusion_models/ - Download CLIP (Qwen 3 4B) and VAE from SeeSee21/Z-Anime
ComfyUI Workflow
- UNETLoader: variant safetensors, weight_dtype:
default - CLIPLoader:
qwen_3_4b.safetensors, type:qwen_image - VAELoader: Z-Anime VAE
- ModelSamplingAuraFlow: shift=3.5
- KSampler:
euler_ancestral/beta- Distill: 4 steps, CFG 1.0
- Base: 28β50 steps, CFG 3.0β5.0
- Resolution: 832Γ1216 (portrait), 1216Γ832 (landscape), 1024Γ1024
Recommended Settings
Distill 4-step
| Parameter | Value |
|---|---|
| Steps | 4 |
| CFG | 1.0 |
| Sampler | euler_ancestral |
| Scheduler | beta |
| Shift | 3.5 |
Base
| Parameter | Value |
|---|---|
| Steps | 28β50 |
| CFG | 3.0β5.0 |
| Sampler | euler_ancestral |
| Scheduler | beta |
| Shift | 3.5 |
Quantization Details
- Method:
comfy_kitchen.quantize_nvfp4()with per-tensor scaling + 16Γ block padding - Format: uint8 packed FP4 (E2M1) + float8_e4m3fn block scales + float32 per-tensor scale
- Source: BF16 β NVFP4 (direct, no FP8 dequant intermediary)
- Distill 4-step: 171 layers quantized
- Base: 239 layers quantized
Why NVFP4?
The S3-DiT architecture (Z-Anime/Z-Image) benefits significantly from NVFP4 because it lacks the FP8-optimized kernels that FLUX/SDXL have. NVFP4 provides a genuine 1.71Γ speedup over FP8 on Blackwell hardware.
Base Model
SeeSee21/Z-Anime β S3-DiT architecture, fine-tuned for anime-style generation. Apache 2.0 license.
Quantized by r0b0tlab on NVIDIA GB10.
- Downloads last month
- -
