Instructions to use argmaxinc/whisperkit-coreml with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- WhisperKit
How to use argmaxinc/whisperkit-coreml with WhisperKit:
# Install CLI with Homebrew on macOS device brew install whisperkit-cli # View all available inference options whisperkit-cli transcribe --help # Download and run inference using whisper base model whisperkit-cli transcribe --audio-path /path/to/audio.mp3 # Or use your preferred model variant whisperkit-cli transcribe --model "large-v3" --model-prefix "distil" --audio-path /path/to/audio.mp3 --verbose
- Notebooks
- Google Colab
- Kaggle
ONNX/TFLite β the mobile inference formats
#30
by 3morixd - opened
We test models in both GGUF (llama.cpp) and ONNX/TFLite formats on our phone farm.
Findings: ONNX Runtime is faster for small models (<500M) on Snapdragon, while GGUF/llama.cpp is better for larger models (1B+) due to memory-mapped loading.
The choice of format matters as much as the choice of model. We benchmark both at dispatchAI.
- Dispatch AI (FZE), Sharjah UAE