Instructions to use jc-builds/Z-Image-Turbo-iOS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jc-builds/Z-Image-Turbo-iOS with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jc-builds/Z-Image-Turbo-iOS",
	filename="Qwen3-4B-Instruct-2507-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "\"Astronaut riding a horse\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use jc-builds/Z-Image-Turbo-iOS with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Use Docker

docker model run hf.co/jc-builds/Z-Image-Turbo-iOS:Q4_K_M

LM Studio
Jan
Ollama
How to use jc-builds/Z-Image-Turbo-iOS with Ollama:
```
ollama run hf.co/jc-builds/Z-Image-Turbo-iOS:Q4_K_M
```

Unsloth Studio new

How to use jc-builds/Z-Image-Turbo-iOS with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jc-builds/Z-Image-Turbo-iOS to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jc-builds/Z-Image-Turbo-iOS to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jc-builds/Z-Image-Turbo-iOS to start chatting

Pi new

How to use jc-builds/Z-Image-Turbo-iOS with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jc-builds/Z-Image-Turbo-iOS:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jc-builds/Z-Image-Turbo-iOS with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use jc-builds/Z-Image-Turbo-iOS with Docker Model Runner:
```
docker model run hf.co/jc-builds/Z-Image-Turbo-iOS:Q4_K_M
```

Lemonade

How to use jc-builds/Z-Image-Turbo-iOS with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jc-builds/Z-Image-Turbo-iOS:Q4_K_M

Run and chat with the model

lemonade run user.Z-Image-Turbo-iOS-Q4_K_M

List all available models

lemonade list

Z-Image-Turbo-iOS / README.md

jc-builds

engine renamed: KilnImage → Mirage

1f0498b verified 4 days ago

preview code

raw

history blame contribute delete

5.06 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	pipeline_tag: text-to-image
	tags:
	- text-to-image
	- diffusion
	- z-image
	- s3-dit
	- gguf
	- quantized
	- on-device
	- ios
	- mobile
	- apple-silicon
	base_model: Tongyi-MAI/Z-Image-Turbo
	---

	# Z-Image-Turbo — iOS bundle

	<p align="center">
	<a href="https://github.com/haplollc/Mirage">
	<img alt="Mirage" src="https://img.shields.io/badge/Runs%20on-Mirage-orange" />
	</a>
	<a href="https://huggingface.co/Tongyi-MAI/Z-Image-Turbo">
	<img alt="Upstream" src="https://img.shields.io/badge/Upstream-Tongyi--MAI%2FZ--Image--Turbo-blue" />
	</a>
	<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-lightgrey" />
	<img alt="Params" src="https://img.shields.io/badge/params-6B-purple" />
	<img alt="Steps" src="https://img.shields.io/badge/steps-9-green" />
	</p>

	A pre-flighted bundle of Z-Image-Turbo + Qwen3-4B-Instruct (text encoder) + FLUX VAE, sized and quantized to fit on iPhone 16 Pro / 17 Pro and run via [Mirage](https://github.com/haplollc/Mirage) — the on-device diffusion engine for iOS / macOS / visionOS.

	Z-Image-Turbo is a 6B-parameter [S3-DiT](https://arxiv.org/abs/2511.22699) (Scalable Single-Stream Diffusion Transformer), distilled to 8-9 sampling steps via Decoupled-DMD + DMDR. It produces photorealistic images at 1024×1024 with bilingual (English + Chinese) prompt understanding.

	## What's inside

	\| File \| Role \| Size \|
	\|---\|---\|---\|
	\| [`z-image-turbo-Q3_K_M.gguf`](./z-image-turbo-Q3_K_M.gguf) \| Diffusion transformer — 6B params, Q3_K_M quant \| 3.9 GB \|
	\| [`Qwen3-4B-Instruct-2507-Q4_K_M.gguf`](./Qwen3-4B-Instruct-2507-Q4_K_M.gguf) \| Text encoder \| 2.3 GB \|
	\| [`ae.safetensors`](./ae.safetensors) \| VAE (from FLUX.1) \| 320 MB \|

	Total bundle size: ~6.5 GB. Total GPU residency at generation time: ~7-8 GB (weights + activations + KV cache).

	## Quick start (Mirage)

	```swift
	import Mirage

	let docs = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]

	let engine = try Engine(models: ModelFiles(
	diffusionModel: docs.appendingPathComponent("z-image-turbo-Q3_K_M.gguf"),
	vae: docs.appendingPathComponent("ae.safetensors"),
	textEncoder: docs.appendingPathComponent("Qwen3-4B-Instruct-2507-Q4_K_M.gguf")
	))

	let image = try await engine.generate(.init(
	prompt: "a photorealistic golden retriever puppy in a sunlit field of wildflowers",
	width: 1024, height: 1024,
	steps: 9, // Turbo distillation — don't go higher
	cfgScale: 1.0 // CFG is baked in
	))
	```

	That's the whole pipeline. See the [Mirage README](https://github.com/haplollc/Mirage) for the full SwiftUI example.

	## Performance (measured via Mirage)

	\| Device \| 1024² @ 9 steps \| 512² @ 9 steps \|
	\|---\|---\|---\|
	\| iPhone 17 Pro \| ~3 min \| ~50 s \|
	\| iPhone 16 Pro \| ~5 min \| ~90 s \|
	\| M2 / M3 Mac \| ~7.5 min \| ~2 min \|

	Memory ceiling — iPhone 14 and older cannot run this bundle. Gate availability on:

	```swift
	ProcessInfo.processInfo.physicalMemory >= 8 * 1024 * 1024 * 1024
	```

	## Sample output

	Prompt: "a single red apple on a white background, photorealistic" · 256² · 4 steps · 28 s on Apple Silicon Mac:

	![sample-apple](https://raw.githubusercontent.com/haplollc/Mirage/main/Resources/sample-apple.png)

	Prompt: "a photorealistic golden retriever puppy in a sunlit field of wildflowers" · 1024² · 9 steps · 7.5 min on Apple Silicon Mac:

	![sample-puppy](https://raw.githubusercontent.com/haplollc/Mirage/main/Resources/sample-puppy.png)

	## Why this bundle exists

	The official Z-Image release is PyTorch + Diffusers — great for servers, doesn't run on iPhone. Unsloth shipped the GGUF-quantized variant, but using it on iOS requires:

	1. An engine that speaks GGUF + S3-DiT (only stable-diffusion.cpp does, as of Dec 2025)
	2. A matching text encoder (Z-Image's training partner is Qwen3-4B, not the more common T5 or CLIP)
	3. A VAE (Z-Image reuses FLUX.1's `ae.safetensors`)

	Picking those three apart from upstream takes effort. This bundle packages them once, with the right quants for iPhone memory budgets.

	## Provenance

	\| Component \| Upstream \| License \|
	\|---\|---\|---\|
	\| Diffusion transformer \| [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) \| Apache 2.0 \|
	\| GGUF conversion \| [unsloth/Z-Image-Turbo-GGUF](https://huggingface.co/unsloth/Z-Image-Turbo-GGUF) \| Apache 2.0 \|
	\| Text encoder \| [unsloth/Qwen3-4B-Instruct-2507-GGUF](https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF) \| Tongyi-Qianwen \|
	\| VAE \| [ffxvs/vae-flux](https://huggingface.co/ffxvs/vae-flux) (re-host of FLUX.1's `ae.safetensors`) \| FLUX-1-dev-non-commercial \|

	## License

	This repository's bundling and documentation are released under Apache 2.0. The individual model weights retain their upstream licenses (linked above). Read each license before commercial use.

	## Built by

	[Haplo](https://haplo.app) · [@jc_builds](https://twitter.com/jc_builds) · [Mirage on GitHub](https://github.com/haplollc/Mirage)