---
license: apache-2.0
base_model:
  - baidu/ERNIE-Image-Turbo
tags:
  - mlx
  - image-generation
  - diffusion
  - apple-silicon
  - ernie-image
library_name: mlx
---

# ERNIE-Image-Turbo-MLX

Pre-converted MLX weights for [baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo). Runs on Apple Silicon via [mlx-ernie-image](https://github.com/treadon/mlx-ernie-image).

## What's included

| File | Size | Component |
|------|------|-----------|
| `dit.npz` | 16.1 GB | DiT (8B, 36 layers) — pre-transposed for MLX |
| `vae.npz` | 100 MB | FLUX.2 VAE decoder — pre-transposed for MLX |
| `bn_stats.npz` | tiny | Batch norm running stats for latent denormalization |
| `config.json` | tiny | DiT architecture config |

Conv2d weights are pre-transposed from PyTorch NCHW to MLX NHWC format. No conversion needed at runtime.

## Usage

```python
from ernie_image import ErnieImagePipeline, TextEncoder

te = TextEncoder.from_pretrained()
pipe = ErnieImagePipeline.from_pretrained("treadon/ERNIE-Image-Turbo-MLX")

emb = te.encode("A vibrant manga comic about a cat and a dragon")
img = pipe.generate(text_embeddings=emb)
img.save("output.png")
```

## Benchmarks (1024x1024, 8 steps, M4 Pro 64GB)

### MLX vs PyTorch/MPS

| Pipeline | Total | Per Step |
|----------|-------|----------|
| PyTorch/MPS (diffusers) | 137.0s | 17.1s/step |
| **MLX (this repo)** | **134.2s** | **16.0s/step** |

### Breakdown

| Component | Time |
|-----------|------|
| Text encode (PyTorch) | 0.1s |
| Denoise (MLX) | 128s |
| VAE decode (MLX) | 6s |
| **Total** | **~134s** |

## Code

[github.com/treadon/mlx-ernie-image](https://github.com/treadon/mlx-ernie-image)

**[Follow @treadon on X](https://x.com/treadon)** for more ML experiments

## Base Model

[baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo) — 8B DiT + Mistral-3 text encoder + FLUX.2 VAE. Apache 2.0.