Card: mark macOS-only (4B over iPhone memory limit); fix app link to zoo apps/CoreAIImageGen

592d3ab verified 18 days ago

3.03 kB

	---
	license: apache-2.0
	base_model: black-forest-labs/FLUX.2-klein-4B
	tags:
	- core-ai
	- coreai
	- text-to-image
	- flux
	- flux2
	- on-device
	- apple-silicon
	pipeline_tag: text-to-image
	library_name: coreai
	---

	# FLUX.2 klein 4B — Core AI

	[Black Forest Labs' FLUX.2 [klein] 4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B)
	converted to Core AI for on-device image generation on Apple Silicon (macOS 27+),
	running on Apple's official diffusion runtime in
	[apple/coreai-models](https://github.com/apple/coreai-models).

	FLUX.2 [klein] is step-distilled: 4 denoising steps at guidance 1.0 produce a full
	1024×1024 image. It pairs a 4B flow-matching diffusion transformer (DiT) with an 8B
	Qwen3 text encoder.

	> macOS only. At 4B the peak footprint (~6.5 GB — the text encoder stays resident
	> through the transformer) exceeds a 12 GB iPhone's ~6.1 GB per-process memory limit, even
	> with the transformer AOT-compiled. Use a smaller diffusion model (e.g. Stable Diffusion
	> 0.9B) for on-device iOS image generation.

	## Components

	\| Component \| Description \|
	\| --- \| --- \|
	\| `Transformer.aimodel` \| Flow-matching DiT (25 blocks), 1024×1024 \|
	\| `TextEncoder.aimodel` \| Qwen3 text encoder (hidden states 9 / 18 / 27) \|
	\| `VAEDecoder.aimodel` \| Latent → 1024×1024 RGB image \|
	\| `VAEEncoder.aimodel` \| 1024×1024 RGB image → latent (image-to-image) \|
	\| `tokenizer/`, `pipeline.json`, `vae_bn_*.npy` \| Sidecar assets (auto-loaded) \|

	Weights are 4-bit quantized (int4, per-block, block size 32); compute precision
	float16. The full bundle is 4.0 GB — Transformer 2.0 GB · TextEncoder 1.8 GB ·
	VAE 0.16 GB.

	## Usage

	### Sample app (easiest)

	[CoreAIImageGen (macOS)](https://github.com/john-rocky/coreai-model-zoo/tree/main/apps/CoreAIImageGen)
	— run the `CoreAIImageGenMac` scheme, tap Download & Load, type a prompt, Generate.

	### Swift

	```swift
	import CoreAIDiffusionPipeline

	let pipeline = try await Flux2Pipeline(from: modelURL)
	let config = PipelineConfiguration(
	prompt: "a photo of a cat",
	stepCount: 4,
	guidanceScale: 1.0,
	schedulerType: .discreteFlow
	)
	let result = try await pipeline.generateImages(configuration: config) { _ in true }
	let image = result.images.first!
	```

	### Command line (zoo reference tool)

	```bash
	swift run -c release diffusion-runner \
	--model path/to/FLUX.2-klein-4B \
	--prompt "a photo of a cat" --steps 4 --guidance-scale 1.0
	```

	## How it was converted

	```bash
	uv run coreai.diffusion.export flux2-klein-4b --platform macOS
	```

	## Performance

	M4 Max (128 GB): ~17 s for a 4-step 1024×1024 image (cold model load + 4 denoising
	steps + VAE decode). The distilled 4-step schedule means no negative prompt / CFG is
	needed (guidance 1.0).

	## License

	Apache 2.0, inherited from the base model
	[black-forest-labs/FLUX.2-klein-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B).
	The converted weights are redistributed under the same terms, with attribution to
	Black Forest Labs.