🔢 FP8 Quantized Version - ComfyUI Compatible

This is the fp8-e4m3fn-scaled / fp8-e4m3fn and fp8-e5m2-scaled / fp8_e5m2 quantized versions of the Z-Image model, optimized for ComfyUI workflows. These quantized formats significantly reduce VRAM requirements while maintaining high image quality, making the model more accessible for consumer-grade GPUs.

Quantization Formats:

fp8-e4m3fn-scaled
fp8-e4m3fn
fp8_e5m2-scaled
fp8_e5m2

Comfyui Workflow

📸 Example Outputs

BF16 / fp8-e4m3fn An Asian woman is seated in a commanding, symmetrical pose on a stark, matte white architectural block, viewed from a distinct low angle looking upward, which elongates her silhouette against the background. She is attired in a structured, high-fashion suit constructed from heavy, lustrous silk brocade in a rich emerald green, embossed with a complex, swirling floral texture that catches the studio illumination. The jacket features sharp, exaggerated shoulder pads and wide lapels, tapering slightly at the waist, while the matching trousers are cut with an extremely wide leg, draping in thick, vertical folds that obscure the shape of her lower limbs. Her legs are positioned wide apart, knees angled outward well beyond the width of her shoulders, creating a distinct triangular negative space between her inner thighs. Her hands rest deliberately on the front upper edge of the white cube, located centrally between her legs, with palms flat against the surface and long, slender fingers draping slightly over the rim. Her head is tilted slightly downward to maintain direct eye contact with the low-positioned camera, her expression stoic and poised, featuring subtle contouring makeup and a satin-finish berry lipstick. Her straight black hair is parted precisely in the center and slicked back tightly, falling in a clean line down her back. The lighting is crisp and directional, casting a defined shadow behind her on the seamless, pale grey background while accentuating the weave of the fabric and the sharp, geometric corners of the seating block. seed: 539314584494176 Steps:30 | CFG:4 | Euler

BF16 / fp8-e4m3fn-scaled An Asian woman is seated in a commanding, symmetrical pose on a stark, matte white architectural block, viewed from a distinct low angle looking upward, which elongates her silhouette against the background. She is attired in a structured, high-fashion suit constructed from heavy, lustrous silk brocade in a rich emerald green, embossed with a complex, swirling floral texture that catches the studio illumination. The jacket features sharp, exaggerated shoulder pads and wide lapels, tapering slightly at the waist, while the matching trousers are cut with an extremely wide leg, draping in thick, vertical folds that obscure the shape of her lower limbs. Her legs are positioned wide apart, knees angled outward well beyond the width of her shoulders, creating a distinct triangular negative space between her inner thighs. Her hands rest deliberately on the front upper edge of the white cube, located centrally between her legs, with palms flat against the surface and long, slender fingers draping slightly over the rim. Her head is tilted slightly downward to maintain direct eye contact with the low-positioned camera, her expression stoic and poised, featuring subtle contouring makeup and a satin-finish berry lipstick. Her straight black hair is parted precisely in the center and slicked back tightly, falling in a clean line down her back. The lighting is crisp and directional, casting a defined shadow behind her on the seamless, pale grey background while accentuating the weave of the fabric and the sharp, geometric corners of the seating block. seed: 539314584494176 Steps:30 | CFG:4 | Euler

Benefits:

Reduced memory footprint (~50% VRAM savings)
Faster inference times
Full ComfyUI compatibility
Minimal quality degradation

⚡️- Image
_{^{An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer}}

🎨 Z-Image

Z-Image is the foundation model of the ⚡️- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence.

While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom.

🌟 Key Features

Undistilled Foundation: As a non-distilled base model, Z-Image preserves the complete training signal. It supports full Classifier-Free Guidance (CFG), providing the precision required for complex prompt engineering and professional workflows.
Aesthetic Versatility: Z-Image masters a vast spectrum of visual languages—from hyper-realistic photography and cinematic digital art to intricate anime and stylized illustrations. It is the ideal engine for scenarios requiring rich, multi-dimensional expression.
Enhanced Output Diversity: Built for exploration, Z-Image delivers significantly higher variability in composition, facial identity, and lighting across different seeds, ensuring that multi-person scenes remain distinct and dynamic.
Built for Development: The ideal starting point for the community. Its non-distilled nature makes it a good base for LoRA training, structural conditioning (ControlNet) and semantic conditioning.
Robust Negative Control: Responds with high fidelity to negative prompting, allowing users to reliably suppress artifacts and adjust compositions.

🆚 Z-Image vs Z-Image-Turbo

Aspect	Z-Image	Z-Image-Turbo
CFG	✅	❌
Steps	28~50	8
Fintunablity	✅	❌
Negative Prompting	✅	❌
Diversity	High	Low
Visual Quality	High	Very High
RL	❌	✅

Downloads last month: 7,949

Paper for drbaph/Z-Image-fp8

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 247