zenz-CoreML
Core ML export of Miwa-Keita/zenz-v3.1-small for Apple platforms.
See CHANGELOG.md for version-to-version changes, docs/stateful-runtime-notes.md for the current stateful runtime contract, and docs/performance.md for the detailed benchmark log.
This repo is organized for Hugging Face Hub delivery, not GitHub Releases or SwiftPM binary targets. The intended upload payload is:
Artifacts/stateless/zenz-stateless-fp16.mlpackageArtifacts/stateless/zenz-stateless-8bit.mlpackageArtifacts/stateful/zenz-stateful-fp16.mlpackageArtifacts/stateful/zenz-stateful-8bit.mlpackagetokenizer/*hf_manifest.json
The original model remains the source of truth for tokenizer semantics, weights provenance, and training lineage. This Core ML port should be linked back to the upstream model when published on Hugging Face.
Runtime shape
statelessis the whole-sequence baseline.statefulis the single-model cached generation path.
The stateful model keeps the same Core ML state layout:
keyCachevalueCache
The current stateful runtime contract is:
- prefill incrementally over the prompt
- decode incrementally with one token at a time
- reuse the same Core ML state
- provide an
attention_maskthat reflects the active sequence length during decode
Compute Units
stateless:.allstateful:.cpuAndGPU
Benchmark
Summary
| Device | Stateful FP16 | Stateful 8-bit |
|---|---|---|
| iPhone Air | 0.436 | 0.431 |
| iPhone 12 | 1.124 | 1.041 |
Recommendation
iPhone 15 Proand newer: useStateful FP16first.- Older devices than
iPhone 15 Pro: use theStateful 8-bitmodel first.
Notes
- On iPhone Air, both stateful variants produced correct outputs in the recorded run.
- On iPhone Air,
8-bitwas slightly faster on mean latency thanFP16. - On iPhone 12,
8-bitwas faster on mean latency thanFP16. - On iPhone 12, the recorded
FP16run under.allshowed degraded outputs while the recorded8-bitrun remained correct. - The current recommendation is based on
statefulrunning under.cpuAndGPU, not.all. - My current read is that
FP16is the better top-end option when the device can sustain it cleanly, but8-bitis the safer deployment default once you care about broader keyboard coverage. - In other words:
FP16is the premium path,8-bitis the compatibility path.
See docs/performance.md for the detailed tables and case-level notes.
Local export
python -m pip install -r requirements.txt
python Scripts/export_all.py
Or run each stage separately:
python convert-to-CoreML.py
python convert-to-CoreML-Stateful.py
- Downloads last month
- 127
Model tree for Skyline23/zenz-coreml
Base model
Miwa-Keita/zenz-v3.1-small