Qwen3-1.7B β€” Apple Core AI Export (iPhone GPU)

Pre-converted Apple Core AI (.aimodelc) bundle of Qwen/Qwen3-1.7B, produced with the coreai-models export recipe and presented without modification. Hashes are embedded so the artifact is a reproducible reference point.

This fills the missing 1.7B rung in the dense Qwen3 Core AI line (0.6b / 4b / 8b already exist) β€” and 1.7B is a meaningful rung: it is the largest dense Qwen3 that still invokes on LiteRT-LM iOS, measured alongside this bundle in a neutral cross-runtime benchmark.

Why GPU-only (no ANE bundle)

The 0.6B repo ships both an ANE (static-shape, palettized) and a GPU (dynamic INT4) bundle. At 1.7B the ANE export is omitted on purpose: the static-shape ANE bundle loads but fails to invoke on iOS 27 β€” a full benchmark run window produced no output β€” the same ANE invoke ceiling the 4B static export hits. Rather than ship a bundle that does not run, only the GPU (dynamic INT4) export is published here; it invokes and decodes cleanly on device. (Core AI's GPU path is unaffected β€” it runs 0.6B/1.7B/4B on iPhone; the ANE path is the one that tops out below 1.7B.)

Bundle

Path Target Compute unit Quant On disk
ios-gpu/ iPhone (h18p) GPU (coreai-pipelined) dynamic INT4 939 MB

Embedded tokenizer (Qwen/Qwen3-1.7B), 40960 max context. iOS bundles are already AOT-compiled (.aimodelc) for the iPhone 17 Pro GPU target.

Measured β€” iPhone 17 Pro (iPhone18,1 Β· iOS 27.0)

Greedy, 128-token budget for short-chat (n=3, iso-cold), 256 for quality. Every figure traces to raw JSONL in the companion benchmark.

Metric Value
Decode 44.7 tok/s cold β†’ ~66 warm
TTFT (warm) ~29 ms
Prefill ~750 tok/s
Peak RAM 248 MB
Quality (8 checkable Qs) 8 / 8, not degenerate

The kernel cache persists across launches, so only the first launch after a fresh install is genuinely cold (44.7); once primed it holds ~66 tok/s β€” the steady state a user actually sees, and the fastest of the runtimes measured at this size (vs MLX Q4 ~66 at 1095 MB, LiteRT-LM int8 ~30 at 512 MB).

Usage

iOS bundles are AOT-compiled and side-loaded into the app container; load via Core AI with the embedded tokenizer and coreai-pipelined GPU engine. See the coreai-models recipe and CoreAIChatMac for an interactive harness.

Provenance

Converted from Qwen/Qwen3-1.7B (Apache-2.0). Quant: dynamic INT4 (linear). Producer: coreai-build-3600.67.5.8.1.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mlboydaisuke/qwen3-1.7b-CoreAI-official

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(833)
this model