kb-whisper-small CoreML encoder

CoreML-compiled encoder bundle for KBLab/kb-whisper-small, intended for use with whisper.cpp on iOS and Apple Silicon Macs to offload encoder inference to the Apple Neural Engine (ANE).

What this is

  • ggml-kb-whisper-small-encoder.mlmodelc.zip — zipped .mlmodelc bundle
  • INT8 weight-quantized via coremltools.optimize.coreml.linear_quantize_weights
  • 85 MB unpacked / 76 MB zipped
  • Built from KBLab/kb-whisper-small PyTorch weights using whisper.cpp v1.8.4's convert-h5-to-coreml.py, patched to use the modern ML Program quantization API (the stock script's --quantize True path fails on coremltools ≥ 9 because it uses the NeuralNetwork-era quantize_weights API)

Usage with whisper.cpp on iOS / Mac

  1. Build whisper.cpp / xcframework with WHISPER_COREML=1
  2. Unzip this bundle and place the ggml-kb-whisper-small-encoder.mlmodelc directory next to your ggml-model-q5_0.bin (or whichever GGML weights you're using from KBLab/kb-whisper-small)
  3. whisper.cpp auto-detects and uses it; first run on each device triggers a ~60 s Apple Neural Engine compile, then cached

Expected speedup

~2× encoder throughput on Apple Silicon vs. Metal-only, with corresponding battery savings on long-form transcription. Most useful on older iPhones where Metal-only encoder struggles to stay ahead of live audio.

Provenance

Step Tool Version
Source weights KBLab/kb-whisper-small as of 2026-05-18
Convert HF → mlpackage whisper.cpp convert-h5-to-coreml.py v1.8.4 (patched)
Quantize coremltools optimize.coreml.linear_quantize_weights 9.0, INT8 symmetric
Compile xcrun coremlc Xcode CLI tools on macOS 15

License

Apache 2.0, matching the upstream KB-Whisper license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pappa1337/kb-whisper-small-coreml

Finetuned
(2)
this model