jc-builds commited on
Commit
68a98e4
·
verified ·
1 Parent(s): 3eb39c9

docs: VAE is ERNIE's own 32-ch AutoencoderKLFlux2, not Flux's 16-ch ae

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -28,7 +28,7 @@ base_model: baidu/ERNIE-Image-Turbo
28
  <img alt="Distill" src="https://img.shields.io/badge/distill-Turbo-green" />
29
  </p>
30
 
31
- A mobile-friendly bundle of **ERNIE-Image-Turbo** — Baidu's 8B single-stream DiT, distilled for fast inference, with state-of-the-art text rendering quality among open-weight models. Bundled with Ministral 3B text encoder + Flux VAE for on-device inference via [**Mirage**](https://github.com/haplollc/Mirage).
32
 
33
  ERNIE-Image-Turbo is particularly strong at:
34
  - **Photorealism** at 1024×1024
@@ -41,9 +41,9 @@ ERNIE-Image-Turbo is particularly strong at:
41
  |---|---|---|
42
  | [`ernie-image-turbo-Q3_K_M.gguf`](./ernie-image-turbo-Q3_K_M.gguf) | Diffusion transformer — 8B params, Q3_K_M | 3.6 GB |
43
  | [`Ministral-3-3B-Instruct-2512-Q4_K_M.gguf`](./Ministral-3-3B-Instruct-2512-Q4_K_M.gguf) | Text encoder (Mistral3 emits the 3072-dim conditioning tensor ERNIE expects) | 2.0 GB |
44
- | [`ae.safetensors`](./ae.safetensors) | VAE (from FLUX.1) | 320 MB |
45
 
46
- Total bundle: **~5.9 GB**. Total GPU residency: ~7 GB. **iPhone 16 Pro / 17 Pro / Mac** territory.
47
 
48
  ## Quick start (Mirage)
49
 
@@ -77,7 +77,7 @@ If you need **text inside images** that actually renders correctly (signs, label
77
  | Diffusion transformer | [baidu/ERNIE-Image](https://github.com/baidu/ERNIE-Image) | Apache 2.0 |
78
  | GGUF conversion | [unsloth/ERNIE-Image-Turbo-GGUF](https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF) | Apache 2.0 |
79
  | Text encoder | [unsloth/Ministral-3-3B-Instruct-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-3B-Instruct-2512-GGUF) | Apache 2.0 |
80
- | VAE | [ffxvs/vae-flux](https://huggingface.co/ffxvs/vae-flux) (re-host of FLUX.1's `ae.safetensors`) | FLUX-1-dev-non-commercial |
81
 
82
  ## Performance (Mirage, rough)
83
 
 
28
  <img alt="Distill" src="https://img.shields.io/badge/distill-Turbo-green" />
29
  </p>
30
 
31
+ A mobile-friendly bundle of **ERNIE-Image-Turbo** — Baidu's 8B single-stream DiT, distilled for fast inference, with state-of-the-art text rendering quality among open-weight models. Bundled with Ministral 3B text encoder + ERNIE's own 32-channel `AutoencoderKLFlux2` VAE for on-device inference via [**Mirage**](https://github.com/haplollc/Mirage).
32
 
33
  ERNIE-Image-Turbo is particularly strong at:
34
  - **Photorealism** at 1024×1024
 
41
  |---|---|---|
42
  | [`ernie-image-turbo-Q3_K_M.gguf`](./ernie-image-turbo-Q3_K_M.gguf) | Diffusion transformer — 8B params, Q3_K_M | 3.6 GB |
43
  | [`Ministral-3-3B-Instruct-2512-Q4_K_M.gguf`](./Ministral-3-3B-Instruct-2512-Q4_K_M.gguf) | Text encoder (Mistral3 emits the 3072-dim conditioning tensor ERNIE expects) | 2.0 GB |
44
+ | [`ae.safetensors`](./ae.safetensors) | VAE — ERNIE's 32-channel `AutoencoderKLFlux2` ( Flux's 16-channel `ae.safetensors`; the two are not interchangeable) | 168 MB |
45
 
46
+ Total bundle: **~5.7 GB**. Total GPU residency: ~7 GB. **iPhone 16 Pro / 17 Pro / Mac** territory.
47
 
48
  ## Quick start (Mirage)
49
 
 
77
  | Diffusion transformer | [baidu/ERNIE-Image](https://github.com/baidu/ERNIE-Image) | Apache 2.0 |
78
  | GGUF conversion | [unsloth/ERNIE-Image-Turbo-GGUF](https://huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF) | Apache 2.0 |
79
  | Text encoder | [unsloth/Ministral-3-3B-Instruct-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-3B-Instruct-2512-GGUF) | Apache 2.0 |
80
+ | VAE | [`baidu/ERNIE-Image-Turbo`](https://huggingface.co/baidu/ERNIE-Image-Turbo) `vae/diffusion_pytorch_model.safetensors`, repacked as `ae.safetensors` | Apache 2.0 |
81
 
82
  ## Performance (Mirage, rough)
83