zenz-CoreML

Core ML export of Miwa-Keita/zenz-v3.1-small for Apple platforms.

See CHANGELOG.md for version-to-version changes, docs/stateful-runtime-notes.md for the current stateful runtime contract, and docs/performance.md for the detailed benchmark log.

This repo is organized for Hugging Face Hub delivery, not GitHub Releases or SwiftPM binary targets. The intended upload payload is:

Artifacts/stateless/zenz-stateless-fp16.mlpackage
Artifacts/stateless/zenz-stateless-8bit.mlpackage
Artifacts/stateful/zenz-stateful-fp16.mlpackage
Artifacts/stateful/zenz-stateful-8bit.mlpackage
tokenizer/*
hf_manifest.json

The original model remains the source of truth for tokenizer semantics, weights provenance, and training lineage. This Core ML port should be linked back to the upstream model when published on Hugging Face.

Runtime shape

stateless is the whole-sequence baseline.
stateful is the single-model cached generation path.

The stateful model keeps the same Core ML state layout:

keyCache
valueCache

The current stateful runtime contract is:

prefill incrementally over the prompt
decode incrementally with one token at a time
reuse the same Core ML state
provide an attention_mask that reflects the active sequence length during decode

Compute Units

stateless: .all
stateful: .cpuAndGPU

Benchmark

Summary

Device	Stateful FP16	Stateful 8-bit
iPhone Air	0.436	0.431
iPhone 12	1.124	1.041

Recommendation

iPhone 15 Pro and newer: use Stateful FP16 first.
Older devices than iPhone 15 Pro: use the Stateful 8-bit model first.

Notes

On iPhone Air, both stateful variants produced correct outputs in the recorded run.
On iPhone Air, 8-bit was slightly faster on mean latency than FP16.
On iPhone 12, 8-bit was faster on mean latency than FP16.
On iPhone 12, the recorded FP16 run under .all showed degraded outputs while the recorded 8-bit run remained correct.
The current recommendation is based on stateful running under .cpuAndGPU, not .all.
My current read is that FP16 is the better top-end option when the device can sustain it cleanly, but 8-bit is the safer deployment default once you care about broader keyboard coverage.
In other words: FP16 is the premium path, 8-bit is the compatibility path.

See docs/performance.md for the detailed tables and case-level notes.

Local export

python -m pip install -r requirements.txt
python Scripts/export_all.py

Or run each stage separately:

python convert-to-CoreML.py
python convert-to-CoreML-Stateful.py

Downloads last month: -

Model tree for Skyline23/zenz-coreml

Base model

Miwa-Keita/zenz-v3.1-small

Quantized

(1)

this model