| --- |
| license: apache-2.0 |
| base_model: |
| - baidu/ERNIE-Image-Turbo |
| tags: |
| - mlx |
| - image-generation |
| - diffusion |
| - apple-silicon |
| - ernie-image |
| library_name: mlx |
| --- |
| |
| # ERNIE-Image-Turbo-MLX |
|
|
| Pre-converted MLX weights for [baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo). Runs on Apple Silicon via [mlx-ernie-image](https://github.com/treadon/mlx-ernie-image). |
|
|
| ## What's included |
|
|
| | File | Size | Component | |
| |------|------|-----------| |
| | `dit.npz` | 16.1 GB | DiT (8B, 36 layers) — pre-transposed for MLX | |
| | `vae.npz` | 100 MB | FLUX.2 VAE decoder — pre-transposed for MLX | |
| | `bn_stats.npz` | tiny | Batch norm running stats for latent denormalization | |
| | `config.json` | tiny | DiT architecture config | |
|
|
| Conv2d weights are pre-transposed from PyTorch NCHW to MLX NHWC format. No conversion needed at runtime. |
|
|
| ## Usage |
|
|
| ```python |
| from ernie_image import ErnieImagePipeline, TextEncoder |
| |
| te = TextEncoder.from_pretrained() |
| pipe = ErnieImagePipeline.from_pretrained("treadon/ERNIE-Image-Turbo-MLX") |
| |
| emb = te.encode("A vibrant manga comic about a cat and a dragon") |
| img = pipe.generate(text_embeddings=emb) |
| img.save("output.png") |
| ``` |
|
|
| ## Benchmarks (1024x1024, 8 steps, M4 Pro 64GB) |
|
|
| ### MLX vs PyTorch/MPS |
|
|
| | Pipeline | Total | Per Step | |
| |----------|-------|----------| |
| | PyTorch/MPS (diffusers) | 137.0s | 17.1s/step | |
| | **MLX (this repo)** | **134.2s** | **16.0s/step** | |
|
|
| ### Breakdown |
|
|
| | Component | Time | |
| |-----------|------| |
| | Text encode (PyTorch) | 0.1s | |
| | Denoise (MLX) | 128s | |
| | VAE decode (MLX) | 6s | |
| | **Total** | **~134s** | |
|
|
| ## Code |
|
|
| [github.com/treadon/mlx-ernie-image](https://github.com/treadon/mlx-ernie-image) |
|
|
| **[Follow @treadon on X](https://x.com/treadon)** for more ML experiments |
|
|
| ## Base Model |
|
|
| [baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo) — 8B DiT + Mistral-3 text encoder + FLUX.2 VAE. Apache 2.0. |
|
|