kb-whisper-small CoreML encoder
CoreML-compiled encoder bundle for KBLab/kb-whisper-small, intended for use with whisper.cpp on iOS and Apple Silicon Macs to offload encoder inference to the Apple Neural Engine (ANE).
What this is
ggml-kb-whisper-small-encoder.mlmodelc.zip— zipped.mlmodelcbundle- INT8 weight-quantized via
coremltools.optimize.coreml.linear_quantize_weights - 85 MB unpacked / 76 MB zipped
- Built from KBLab/kb-whisper-small PyTorch weights using whisper.cpp v1.8.4's
convert-h5-to-coreml.py, patched to use the modern ML Program quantization API (the stock script's--quantize Truepath fails on coremltools ≥ 9 because it uses the NeuralNetwork-eraquantize_weightsAPI)
Usage with whisper.cpp on iOS / Mac
- Build whisper.cpp / xcframework with
WHISPER_COREML=1 - Unzip this bundle and place the
ggml-kb-whisper-small-encoder.mlmodelcdirectory next to yourggml-model-q5_0.bin(or whichever GGML weights you're using from KBLab/kb-whisper-small) - whisper.cpp auto-detects and uses it; first run on each device triggers a ~60 s Apple Neural Engine compile, then cached
Expected speedup
~2× encoder throughput on Apple Silicon vs. Metal-only, with corresponding battery savings on long-form transcription. Most useful on older iPhones where Metal-only encoder struggles to stay ahead of live audio.
Provenance
| Step | Tool | Version |
|---|---|---|
| Source weights | KBLab/kb-whisper-small | as of 2026-05-18 |
| Convert HF → mlpackage | whisper.cpp convert-h5-to-coreml.py |
v1.8.4 (patched) |
| Quantize | coremltools optimize.coreml.linear_quantize_weights |
9.0, INT8 symmetric |
| Compile | xcrun coremlc |
Xcode CLI tools on macOS 15 |
License
Apache 2.0, matching the upstream KB-Whisper license.