treadon's picture
Upload README.md with huggingface_hub
7888b83 verified
---
license: apache-2.0
base_model:
- baidu/ERNIE-Image-Turbo
tags:
- mlx
- image-generation
- diffusion
- apple-silicon
- ernie-image
library_name: mlx
---
# ERNIE-Image-Turbo-MLX
Pre-converted MLX weights for [baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo). Runs on Apple Silicon via [mlx-ernie-image](https://github.com/treadon/mlx-ernie-image).
## What's included
| File | Size | Component |
|------|------|-----------|
| `dit.npz` | 16.1 GB | DiT (8B, 36 layers) — pre-transposed for MLX |
| `vae.npz` | 100 MB | FLUX.2 VAE decoder — pre-transposed for MLX |
| `bn_stats.npz` | tiny | Batch norm running stats for latent denormalization |
| `config.json` | tiny | DiT architecture config |
Conv2d weights are pre-transposed from PyTorch NCHW to MLX NHWC format. No conversion needed at runtime.
## Usage
```python
from ernie_image import ErnieImagePipeline, TextEncoder
te = TextEncoder.from_pretrained()
pipe = ErnieImagePipeline.from_pretrained("treadon/ERNIE-Image-Turbo-MLX")
emb = te.encode("A vibrant manga comic about a cat and a dragon")
img = pipe.generate(text_embeddings=emb)
img.save("output.png")
```
## Benchmarks (1024x1024, 8 steps, M4 Pro 64GB)
### MLX vs PyTorch/MPS
| Pipeline | Total | Per Step |
|----------|-------|----------|
| PyTorch/MPS (diffusers) | 137.0s | 17.1s/step |
| **MLX (this repo)** | **134.2s** | **16.0s/step** |
### Breakdown
| Component | Time |
|-----------|------|
| Text encode (PyTorch) | 0.1s |
| Denoise (MLX) | 128s |
| VAE decode (MLX) | 6s |
| **Total** | **~134s** |
## Code
[github.com/treadon/mlx-ernie-image](https://github.com/treadon/mlx-ernie-image)
**[Follow @treadon on X](https://x.com/treadon)** for more ML experiments
## Base Model
[baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo) — 8B DiT + Mistral-3 text encoder + FLUX.2 VAE. Apache 2.0.