--- license: apache-2.0 base_model: - baidu/ERNIE-Image-Turbo tags: - mlx - image-generation - diffusion - apple-silicon - ernie-image library_name: mlx --- # ERNIE-Image-Turbo-MLX Pre-converted MLX weights for [baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo). Runs on Apple Silicon via [mlx-ernie-image](https://github.com/treadon/mlx-ernie-image). ## What's included | File | Size | Component | |------|------|-----------| | `dit.npz` | 16.1 GB | DiT (8B, 36 layers) — pre-transposed for MLX | | `vae.npz` | 100 MB | FLUX.2 VAE decoder — pre-transposed for MLX | | `bn_stats.npz` | tiny | Batch norm running stats for latent denormalization | | `config.json` | tiny | DiT architecture config | Conv2d weights are pre-transposed from PyTorch NCHW to MLX NHWC format. No conversion needed at runtime. ## Usage ```python from ernie_image import ErnieImagePipeline, TextEncoder te = TextEncoder.from_pretrained() pipe = ErnieImagePipeline.from_pretrained("treadon/ERNIE-Image-Turbo-MLX") emb = te.encode("A vibrant manga comic about a cat and a dragon") img = pipe.generate(text_embeddings=emb) img.save("output.png") ``` ## Benchmarks (1024x1024, 8 steps, M4 Pro 64GB) ### MLX vs PyTorch/MPS | Pipeline | Total | Per Step | |----------|-------|----------| | PyTorch/MPS (diffusers) | 137.0s | 17.1s/step | | **MLX (this repo)** | **134.2s** | **16.0s/step** | ### Breakdown | Component | Time | |-----------|------| | Text encode (PyTorch) | 0.1s | | Denoise (MLX) | 128s | | VAE decode (MLX) | 6s | | **Total** | **~134s** | ## Code [github.com/treadon/mlx-ernie-image](https://github.com/treadon/mlx-ernie-image) **[Follow @treadon on X](https://x.com/treadon)** for more ML experiments ## Base Model [baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo) — 8B DiT + Mistral-3 text encoder + FLUX.2 VAE. Apache 2.0.