docs: VAE is ERNIE's own 32-ch AutoencoderKLFlux2, not Flux's 16-ch ae
Browse files
README.md
CHANGED
|
@@ -28,7 +28,7 @@ base_model: baidu/ERNIE-Image-Turbo
|
|
| 28 |
<img alt="Distill" src="https://img.shields.io/badge/distill-Turbo-green" />
|
| 29 |
</p>
|
| 30 |
|
| 31 |
-
A mobile-friendly bundle of **ERNIE-Image-Turbo** — Baidu's 8B single-stream DiT, distilled for fast inference, with state-of-the-art text rendering quality among open-weight models. Bundled with Ministral 3B text encoder +
|
| 32 |
|
| 33 |
ERNIE-Image-Turbo is particularly strong at:
|
| 34 |
- **Photorealism** at 1024×1024
|
|
@@ -41,9 +41,9 @@ ERNIE-Image-Turbo is particularly strong at:
|
|
| 41 |
|---|---|---|
|
| 42 |
| [`ernie-image-turbo-Q3_K_M.gguf`](./ernie-image-turbo-Q3_K_M.gguf) | Diffusion transformer — 8B params, Q3_K_M | 3.6 GB |
|
| 43 |
| [`Ministral-3-3B-Instruct-2512-Q4_K_M.gguf`](./Ministral-3-3B-Instruct-2512-Q4_K_M.gguf) | Text encoder (Mistral3 emits the 3072-dim conditioning tensor ERNIE expects) | 2.0 GB |
|
| 44 |
-
| [`ae.safetensors`](./ae.safetensors) | VAE (
|
| 45 |
|
| 46 |
-
Total bundle: **~5.
|
| 47 |
|
| 48 |
## Quick start (Mirage)
|
| 49 |
|
|
@@ -77,7 +77,7 @@ If you need **text inside images** that actually renders correctly (signs, label
|
|
| 77 |
| Diffusion transformer | [baidu/ERNIE-Image](https://github.com/baidu/ERNIE-Image) | Apache 2.0 |
|
| 78 |
| GGUF conversion | [unsloth/ERNIE-Image-Turbo-GGUF](https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF) | Apache 2.0 |
|
| 79 |
| Text encoder | [unsloth/Ministral-3-3B-Instruct-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-3B-Instruct-2512-GGUF) | Apache 2.0 |
|
| 80 |
-
| VAE | [
|
| 81 |
|
| 82 |
## Performance (Mirage, rough)
|
| 83 |
|
|
|
|
| 28 |
<img alt="Distill" src="https://img.shields.io/badge/distill-Turbo-green" />
|
| 29 |
</p>
|
| 30 |
|
| 31 |
+
A mobile-friendly bundle of **ERNIE-Image-Turbo** — Baidu's 8B single-stream DiT, distilled for fast inference, with state-of-the-art text rendering quality among open-weight models. Bundled with Ministral 3B text encoder + ERNIE's own 32-channel `AutoencoderKLFlux2` VAE for on-device inference via [**Mirage**](https://github.com/haplollc/Mirage).
|
| 32 |
|
| 33 |
ERNIE-Image-Turbo is particularly strong at:
|
| 34 |
- **Photorealism** at 1024×1024
|
|
|
|
| 41 |
|---|---|---|
|
| 42 |
| [`ernie-image-turbo-Q3_K_M.gguf`](./ernie-image-turbo-Q3_K_M.gguf) | Diffusion transformer — 8B params, Q3_K_M | 3.6 GB |
|
| 43 |
| [`Ministral-3-3B-Instruct-2512-Q4_K_M.gguf`](./Ministral-3-3B-Instruct-2512-Q4_K_M.gguf) | Text encoder (Mistral3 emits the 3072-dim conditioning tensor ERNIE expects) | 2.0 GB |
|
| 44 |
+
| [`ae.safetensors`](./ae.safetensors) | VAE — ERNIE's 32-channel `AutoencoderKLFlux2` (≠ Flux's 16-channel `ae.safetensors`; the two are not interchangeable) | 168 MB |
|
| 45 |
|
| 46 |
+
Total bundle: **~5.7 GB**. Total GPU residency: ~7 GB. **iPhone 16 Pro / 17 Pro / Mac** territory.
|
| 47 |
|
| 48 |
## Quick start (Mirage)
|
| 49 |
|
|
|
|
| 77 |
| Diffusion transformer | [baidu/ERNIE-Image](https://github.com/baidu/ERNIE-Image) | Apache 2.0 |
|
| 78 |
| GGUF conversion | [unsloth/ERNIE-Image-Turbo-GGUF](https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF) | Apache 2.0 |
|
| 79 |
| Text encoder | [unsloth/Ministral-3-3B-Instruct-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-3B-Instruct-2512-GGUF) | Apache 2.0 |
|
| 80 |
+
| VAE | [`baidu/ERNIE-Image-Turbo`](https://huggingface.co/baidu/ERNIE-Image-Turbo) — `vae/diffusion_pytorch_model.safetensors`, repacked as `ae.safetensors` | Apache 2.0 |
|
| 81 |
|
| 82 |
## Performance (Mirage, rough)
|
| 83 |
|