kb-whisper-small CoreML encoder

CoreML-compiled encoder bundle for KBLab/kb-whisper-small, intended for use with whisper.cpp on iOS and Apple Silicon Macs to offload encoder inference to the Apple Neural Engine (ANE).

What this is

ggml-kb-whisper-small-encoder.mlmodelc.zip — zipped .mlmodelc bundle
INT8 weight-quantized via coremltools.optimize.coreml.linear_quantize_weights
85 MB unpacked / 76 MB zipped
Built from KBLab/kb-whisper-small PyTorch weights using whisper.cpp v1.8.4's convert-h5-to-coreml.py, patched to use the modern ML Program quantization API (the stock script's --quantize True path fails on coremltools ≥ 9 because it uses the NeuralNetwork-era quantize_weights API)

Usage with whisper.cpp on iOS / Mac

Build whisper.cpp / xcframework with WHISPER_COREML=1
Unzip this bundle and place the ggml-kb-whisper-small-encoder.mlmodelc directory next to your ggml-model-q5_0.bin (or whichever GGML weights you're using from KBLab/kb-whisper-small)
whisper.cpp auto-detects and uses it; first run on each device triggers a ~60 s Apple Neural Engine compile, then cached

Expected speedup

~2× encoder throughput on Apple Silicon vs. Metal-only, with corresponding battery savings on long-form transcription. Most useful on older iPhones where Metal-only encoder struggles to stay ahead of live audio.

Provenance

Step	Tool	Version
Source weights	KBLab/kb-whisper-small	as of 2026-05-18
Convert HF → mlpackage	whisper.cpp `convert-h5-to-coreml.py`	v1.8.4 (patched)
Quantize	coremltools `optimize.coreml.linear_quantize_weights`	9.0, INT8 symmetric
Compile	`xcrun coremlc`	Xcode CLI tools on macOS 15

License

Apache 2.0, matching the upstream KB-Whisper license.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for pappa1337/kb-whisper-small-coreml

Base model

openai/whisper-small

Quantized

KBLab/kb-whisper-small

Finetuned

(2)

this model