Z-Image-Turbo — MLX (4-bit Quantized)

MLX conversion of Tongyi-MAI/Z-Image-Turbo for Apple Silicon.

This is the 4-bit quantized MLX conversion. Linear layer weights are quantized to 4-bit with group_size=64. VAE remains in float16 to preserve image quality.

Model size: 6.48 GB

All Available MLX Variants

Variant Size Quantization Link
Full Precision (fp16) 20.54 GB None andrevp/Z-Image-Turbo-MLX
8-bit 11.37 GB 8-bit, group_size=64 andrevp/Z-Image-Turbo-MLX-8bit
4-bit 6.48 GB 4-bit, group_size=64 andrevp/Z-Image-Turbo-MLX-4bit
2-bit 4.04 GB 2-bit, group_size=64 andrevp/Z-Image-Turbo-MLX-2bit

About Z-Image-Turbo

Z-Image is an efficient 6B-parameter image generation foundation model using a Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture. Z-Image-Turbo is the distilled variant with only 8 NFEs (Number of Function Evaluations), achieving sub-second inference latency.

Key Features

  • Photorealistic image generation with state-of-the-art quality
  • Bilingual text rendering (English & Chinese)
  • Strong instruction adherence
  • 8-step inference — distilled via Decoupled-DMD + Reinforcement Learning (DMDR)
  • No CFG required — guidance_scale=0.0

Architecture

Component Architecture Parameters
Text Encoder Qwen3 (36 layers, hidden_size=2560, GQA with 32/8 heads) ~7.8 GB (fp16)
Transformer ZImageTransformer2DModel (30 layers, dim=3840, 30 heads) ~12.3 GB (fp16)
VAE AutoencoderKL (from Flux, 16 latent channels) ~160 MB (fp16)
Tokenizer Qwen2Tokenizer (vocab_size=151,936) —
Scheduler FlowMatchEulerDiscreteScheduler —

The S3-DiT architecture concatenates text tokens, visual semantic tokens, and image VAE tokens at the sequence level as a unified input stream, maximizing parameter efficiency compared to dual-stream approaches.

Quantization Details

Parameter Value
Bits 4
Group Size 64
Quantized Components Text Encoder (Qwen3), Transformer (ZImageTransformer2DModel)
Non-Quantized Components VAE (AutoencoderKL) — kept at float16 for image quality
Quantized Tensors 526 Linear layer weight tensors
Method MLX group quantization (mlx.core.quantize)

Only 2D weight tensors from Linear layers are quantized. Normalization layers, biases, embeddings, and position encodings remain in float16.

Component Sizes

Component Original (bf16) This Variant (4-bit Quantized)
Text Encoder 7.8 GB ~2.8 GB
Transformer 24.6 GB ~3.5 GB
VAE 160 MB 160 MB
Total ~32.6 GB 6.48 GB

Original Model

Original Usage (PyTorch/CUDA)

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

prompt = "Young Chinese woman in red Hanfu, intricate embroidery, ancient temple backdrop"

image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,   # Results in 8 DiT forwards
    guidance_scale=0.0,      # No CFG for Turbo models
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("example.png")

Conversion Details

  • Converted using MLX {0.30.6} on Apple Silicon
  • Weights converted from bfloat16 to float16
  • SafeTensors format (MLX-compatible)
  • All weight keys preserved and verified
  • VAE kept at float16 across all quantization levels
  • Verified: no NaN/Inf values, all shapes consistent, all index files valid

Citation

@article{z-image2025,
    title={Z-Image: An Efficient Image Generation Foundation Model with Scalable Single Stream Diffusion Transformer},
    author={Tongyi MAI Team},
    journal={arXiv preprint arXiv:2511.22699},
    year={2025}
}

@article{decoupled-dmd2025,
    title={Decoupled Consistency Model Distillation},
    author={Liu et al.},
    journal={arXiv preprint arXiv:2511.22677},
    year={2025}
}

@article{dmdr2025,
    title={DMDR: Fusing DMD with Reinforcement Learning},
    author={Jiang et al.},
    journal={arXiv preprint arXiv:2511.13649},
    year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for andrevp/Z-Image-Turbo-MLX-4bit

Finetuned
(87)
this model

Papers for andrevp/Z-Image-Turbo-MLX-4bit