--- license: apache-2.0 pipeline_tag: text-to-image tags: - 1-bit - mlx - apple-silicon - on-device - text-to-image - diffusion - flux - prismml - bonsai base_model: - prism-ml/bonsai-image-binary-4B-unpacked ---

Bonsai Image

Prism ML Website  |  Whitepaper  |  Demo & Examples  |  Discord

# bonsai-image-binary-4B-mlx-1bit Binary weight (1-bit) text-to-image diffusion transformer deployment for Apple Silicon > **0.93 GB transformer** | **8.3×** smaller than FP16 | **9.4 s / 512²** on iPhone 17 Pro Max | **6 s / 512²** on M4 Pro | runs on Mac, iPhone, iPad ## Highlights - **0.93 GB** diffusion transformer, down from **7.75 GB** for the FP16 FLUX.2 Klein 4B transformer - Binary {−1, +1} transformer weights with FP16 group-wise scaling in the matrix-heavy transformer layers (Q/K/V projections, output projections, MLP weights) - 3.42 GB Apple Silicon deployment payload including the 4-bit text encoder and FP16 VAE — text encoder is offloaded after prompt encode, so the denoising loop only keeps the compact transformer and VAE resident - 4-step FlowMatch-Euler sampler with guidance = 1.0 and shift = 3.0 — no CFG, no negative prompts needed - MLX-native 1-bit format for Apple Silicon, the same kernel path as our 1-bit language-model releases - Cross-platform companion: also available as [gemlite 1-bit](https://huggingface.co/prism-ml/bonsai-image-binary-4B-gemlite-1bit) for NVIDIA GPUs ## Resources - **[Whitepaper](https://github.com/PrismML-Eng/Bonsai-Image-Demo/blob/main/bonsai-image-4b-whitepaper.pdf)** — full benchmarks, kernels, and memory analysis - **[Demo repo](https://github.com/PrismML-Eng/Bonsai-Image-Demo)** — one-command setup for Mac / Linux / Windows - **[Discord](https://discord.gg/prismml)** — community + support - **Kernels**: [MLX fork](https://github.com/PrismML-Eng/mlx) (Apple Silicon) · [mlx-swift fork](https://github.com/PrismML-Eng/mlx-swift) (iOS / macOS) — upstream PRs pending ## Model Overview | Item | Specification | | :-------------------- | :-------------------------------------------------------------------------------------| | Base architecture | FLUX.2 Klein 4B (MMDiT diffusion transformer) | | Parameters | ~4.0B (transformer trunk) | | Blocks | 25 MMDiT blocks: 5 double-stream + 20 single-stream | | Sampler | FlowMatchEuler, **4 steps**, guidance = 1.0, shift = 3.0 | | Text encoder | Qwen3-4B at 4-bit (≈ 2.28 GB on-device, offloaded after prompt encode) | | VAE | Flux2 32-channel latent, tiled decode (128 px tiles) | | Native resolution | 1024×1024 (also supports 512×512 and arbitrary multiples of 32) | | Weight format | MLX 1-bit g128, binary values + FP16 group-wise scales | | **Transformer size** | **0.93 GB** (8.3× smaller than 7.75 GB FP16) | | Total payload | **3.42 GB** (4.7x smaller than the 15.97 GB FP16 transformer + text encoder + VAE) | | 1-bit coverage | All 100 matmul-heavy linears in the 25 MMDiT blocks | | License | Apache 2.0 | ## Binary Weight Representation: 1-bit g128 Each binary weight takes a value from {−1, +1} with one shared FP16 scale per group of 128 weights: ``` w_i = scale_g * b_i, b_i in {−1, +1} ``` Binary values carry exactly 1 bit of information per weight. With one FP16 scale per group of 128, the effective storage is ``` b_eff ≈ 1 + 16/128 ≈ 1.125 bits/weight ``` This gives an idealized **14.2× reduction** relative to FP16 for the binary transformer layers. A small set of precision-sensitive supporting tensors remains in FP16, so the final 1-bit Bonsai Image 4B diffusion transformer is **0.93 GB**, an 8.3× reduction from the 7.75 GB FP16 FLUX.2 Klein 4B transformer. The binary representation is applied to the matrix-heavy transformer layers, including Q / K / V projections, output projections, MLP linears, and the double-stream add-K / Q / V linears. Supporting tensors (less than 5% of the total parameters) such as modulation streams, embedders, output norm, and output projection remain FP16 for image quality and stability. ### Memory | Format | Transformer size | Reduction | Ratio | | :------------------------- | ---------------: | --------: | -------: | | FP16 FLUX.2 Klein 4B | 7.75 GB | — | 1.0× | | **1-bit Bonsai Image 4B** | **0.93 GB** | **88.0%** | **8.3×** | Apple Silicon deployment: | Component | Size | | :------------------------------ | ------: | | MLX 1-bit diffusion transformer | 0.97 GB | | Compressed text encoder | 2.28 GB | | FP16 VAE | 0.17 GB | | **Total payload** | **3.42 GB** | At runtime, the text encoder is offloaded after prompt encoding. During denoising, the repeated image-generation loop is dominated by the compact binary diffusion transformer and active image-generation components rather than the full payload. End-to-end Mac M4 Pro mean-active memory pressure at 1024² is **1.95 GB** — a **7.4×** reduction vs the stock FP16 MFLUX pipeline (14.39 GB). ## Best Practices - Sampler: FlowMatchEuler-discrete with 4 steps, guidance = 1.0 (no classifier-free guidance), shift = 3.0. The model is designed for 4 steps; running more steps does not improve quality significantly and can introduce artifacts. - Resolution: native 1024² is the design target; 512² works for quick previews. - Aspect ratios: multiples of 32 are supported, including 832×1248 and 1248×832. - Prompting: natural-language prompts. Negative prompts are not required. - Runtime memory: the text encoder is offloaded after prompt encoding, so the denoising loop is memory-light. ## Quickstart ### MLX (Python) The simplest path is the [Bonsai Image Demo repo](https://github.com/PrismML-Eng/Bonsai-Image-Demo), which sets up the full Bonsai Studio (FastAPI backend + Next.js frontend): ```bash git clone https://github.com/PrismML-Eng/Bonsai-Image-Demo.git cd Bonsai-Image-Demo ./setup.sh BONSAI_VARIANT=binary ./scripts/download_model.sh BONSAI_VARIANT=binary ./scripts/serve.sh ``` For a one-shot render without the studio frontend: ```bash BONSAI_VARIANT=binary ./scripts/generate.sh --prompt "A bonsai tree in a quiet ceramic studio, soft morning light" ``` ### MLX Swift (iOS / macOS) Binary Bonsai Image 4B runs natively on iPhone and iPad via MLX Swift. Bonsai Studio for iPhone is available on the App Store; under the hood, it loads this model with the kernels in our [mlx-swift fork](https://github.com/PrismML-Eng/mlx-swift). ## Throughput (MLX / Apple Silicon) Mac M4 Pro (48 GB unified memory), 4 denoising steps, fixed prompt and seed: | Resolution | s / step | s / image (mean ± std) | vs stock MFLUX FP16 | | :------------ | -------: | ---------------------: | ------------------: | | 512 × 512 | 1.50 | 6.01 ± 0.31 s | **3.03×** | | 1024 × 1024 | 6.02 | **24.07 ± 0.03 s** | **5.60×** | iPhone 17 Pro Max (A19 Pro, 12 GB unified memory), MLX Swift, same methodology: | Resolution | s / step | s / image | | :------------ | -------: | --------: | | 128 × 128 | 0.68 | 2.7 s | | 256 × 256 | 0.95 | 3.8 s | | 512 × 512 | 2.35 | **9.4 s** | | 1024 × 1024 | 8.15 | **32.6 s**| Stock FP16 FLUX.2 Klein 4B does not fit within iPhone 17 Pro Max's 12 GB unified memory budget; Bonsai Image 4B models do. ## Benchmarks Evaluated with matched generation settings across the comparison set on H100. GenEval uses the official 512x512 protocol. For HPSv3 and DPG-Bench, larger-backbone rows are evaluated at 1024x1024, while smaller-backbone rows are evaluated at their native 512x512 setting. Higher is better for all three benchmarks. | Model | Transformer (GB) | GenEval | HPSv3 | DPG-Bench | | :-------------------------- | ---------------: | ------: | -----: | --------: | | **Bonsai Image · Binary 4B**| **0.93** | **0.671** | **11.15** | **0.822** | | **Bonsai Image · Ternary 4B** | **1.21** | **0.723** | **12.22** | **0.851** | | FLUX.2 Klein 4B | 7.75 | 0.819 | 12.84 | 0.853 | | FLUX.1-schnell | 23.8 | 0.716 | 12.67 | 0.848 | | SDXL | 5.14 | 0.300 | 10.05 | 0.740 | | PixArt-Σ XL 2 | 1.20 | 0.541 | 11.93 | 0.769 | | Stable Diffusion 1.5 | 1.72 | 0.396 | 4.20 | 0.601 | | BK-SDM-Small | 0.98 | 0.297 | 3.05 | 0.559 | The benchmark results show the intended quality-footprint trade-off. 1-bit Bonsai Image 4B is the footprint-oriented variant: it reduces the diffusion transformer below 1 GB while still delivering strong GenEval, HPSv3, and DPG-Bench results. The ternary companion is the quality-oriented variant, using a slightly larger representation to achieve very close visual quality and prompt fidelity to the original FLUX.2 Klein 4B model. Together, the Bonsai Image variants move the quality-footprint frontier: they bring modern diffusion-transformer behavior into a memory range previously occupied by much smaller, lower-capability models. ## Use Cases - **Local creative tooling**: image generation directly on Mac, iPhone, and iPad - **Private generation**: prompts and generated assets can remain local - **Rapid iteration**: lower local latency and no remote queue for iterative creative workflows - **Mobile deployment**: image generation on devices with unified-memory, thermal, and connectivity constraints - **Commodity-GPU serving**: lower transformer footprint and reduced memory pressure for serving on CUDA GPUs - **Enterprise and controlled inference**: local or private environments for data residency and compliance-sensitive workflows ## Limitations - 1-bit Bonsai Image 4B is not bit-identical to the FP16 FLUX.2 Klein 4B model; it is a compact binary-weight deployment designed to deliver similar practical behavior at much smaller size. - Image-generation quality remains prompt- and workflow-dependent. Small text, fine details, object counts, and strict compositional constraints should be evaluated for the target use case. - Current commodity inference stacks do not yet expose fully native binary execution as a standard hardware path. This release uses practical MLX low-bit kernel paths on Apple Silicon and Gemlite low-bit GEMM on CUDA. - After the diffusion transformer is made compact, other components such as the VAE can become more visible memory bottlenecks. The runtime mitigates this with text-encoder offload and tiled VAE decoding. ## Citation ```bibtex @techreport{bonsaiimage4b, title = {Bonsai Image 4B: Low-Bit Diffusion on Apple Silicon and Consumer GPUs}, author = {Prism ML}, year = {2026}, month = {May}, url = {https://prismml.com} } ``` ## Contact For questions, feedback, or collaboration inquiries: **contact@prismml.com**