treadon
/

ERNIE-Image-Turbo-MLX

image-generation

Model card Files Files and versions

ERNIE-Image-Turbo-MLX / README.md

treadon's picture

Upload README.md with huggingface_hub

7888b83 verified 1 day ago

|

history blame contribute delete

1.9 kB

	---
	license: apache-2.0
	base_model:
	- baidu/ERNIE-Image-Turbo
	tags:
	- mlx
	- image-generation
	- diffusion
	- apple-silicon
	- ernie-image
	library_name: mlx
	---

	# ERNIE-Image-Turbo-MLX

	Pre-converted MLX weights for [baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo). Runs on Apple Silicon via [mlx-ernie-image](https://github.com/treadon/mlx-ernie-image).

	## What's included

	\| File \| Size \| Component \|
	\|------\|------\|-----------\|
	\| `dit.npz` \| 16.1 GB \| DiT (8B, 36 layers) — pre-transposed for MLX \|
	\| `vae.npz` \| 100 MB \| FLUX.2 VAE decoder — pre-transposed for MLX \|
	\| `bn_stats.npz` \| tiny \| Batch norm running stats for latent denormalization \|
	\| `config.json` \| tiny \| DiT architecture config \|

	Conv2d weights are pre-transposed from PyTorch NCHW to MLX NHWC format. No conversion needed at runtime.

	## Usage

	```python
	from ernie_image import ErnieImagePipeline, TextEncoder

	te = TextEncoder.from_pretrained()
	pipe = ErnieImagePipeline.from_pretrained("treadon/ERNIE-Image-Turbo-MLX")

	emb = te.encode("A vibrant manga comic about a cat and a dragon")
	img = pipe.generate(text_embeddings=emb)
	img.save("output.png")
	```

	## Benchmarks (1024x1024, 8 steps, M4 Pro 64GB)

	### MLX vs PyTorch/MPS

	\| Pipeline \| Total \| Per Step \|
	\|----------\|-------\|----------\|
	\| PyTorch/MPS (diffusers) \| 137.0s \| 17.1s/step \|
	\| MLX (this repo) \| 134.2s \| 16.0s/step \|

	### Breakdown

	\| Component \| Time \|
	\|-----------\|------\|
	\| Text encode (PyTorch) \| 0.1s \|
	\| Denoise (MLX) \| 128s \|
	\| VAE decode (MLX) \| 6s \|
	\| Total \| ~134s \|

	## Code

	[github.com/treadon/mlx-ernie-image](https://github.com/treadon/mlx-ernie-image)

	[Follow @treadon on X](https://x.com/treadon) for more ML experiments

	## Base Model

	[baidu/ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo) — 8B DiT + Mistral-3 text encoder + FLUX.2 VAE. Apache 2.0.