whisperkit-coreml

@@ -1,4 +1,3 @@
 ---
 pretty_name: "WhisperKit"
 viewer: false
@@ -11,13 +10,67 @@ tags:
 - quantized
 - automatic-speech-recognition
 ---
 # WhisperKit
 WhisperKit is an on-device speech recognition framework for Apple Silicon:
 https://github.com/argmaxinc/WhisperKit
 Check out the WhisperKit paper and presentation from ICML 2025:
-https://icml.cc/virtual/2025/47854
 For real-time streaming API, custom vocabulary, speaker diarization, and more, check out Argmax SDK: https://www.argmaxinc.com/blog/argmax-sdk-2

 ---
 pretty_name: "WhisperKit"
 viewer: false
 - quantized
 - automatic-speech-recognition
 ---
 # WhisperKit
 WhisperKit is an on-device speech recognition framework for Apple Silicon:
 https://github.com/argmaxinc/WhisperKit
 Check out the WhisperKit paper and presentation from ICML 2025:
+https://icml.cc/virtual/2025/47854
 For real-time streaming API, custom vocabulary, speaker diarization, and more, check out Argmax SDK: https://www.argmaxinc.com/blog/argmax-sdk-2
+---
+## Evaluation: openai_whisper-large-v3-v20240930_turbo_632MB
+Transcription test results for the turbo 632MB model from this repo (aoiandroid/whisperkit-coreml).
+### Environment
+| Item | Value |
+|------|--------|
+| Platform | macOS 14.x (arm64, Apple Silicon) |
+| WhisperKit | [argmaxinc/WhisperKit](https://github.com/argmaxinc/WhisperKit) 0.15.0+ (Swift Package) |
+| Model repo | aoiandroid/whisperkit-coreml |
+| Test date | 2026-03-17 |
+| Audio formats | m4a, mp3, wav, flac |
+### Test results (14 files, multi-language)
+| File | Language / Content | Note |
+|------|--------------------|------|
+| English.mp3 | English | Texas travel narration (Gage Hotel, Padre Island, Corpus Christi, seafood); stable long-form transcription |
+| Euskara.mp3 | Basque | Speech on language and identity |
+| Guaraní.mp3 | Guarani | Short speech |
+| Yorùbá.mp3 | Yoruba | Education and future |
+| afrikaasns.mp3 | Afrikaans | Value of learning a new language |
+| arabic.mp3 | Arabic | Speech on hope and future (full Arabic) |
+| bengali.m4a | Bengali | Some mixed-language / recognition errors |
+| chinese.mp3 | Chinese | Long explanation on smart traffic systems |
+| isiZulu.mp3 | isiZulu | Future, education, youth |
+| kiswahili.mp3 | Kiswahili | Unity (umoja) |
+| korean.mp3 | Korean | "On challenge" (도전에 대하여) |
+| russinan.m4a | Russian | Russia–Latin America parliamentary conference (with some English at end) |
+| test.mp3 | Japanese | Typhoon 14 news; high accuracy |
+| 日本語.mp3 | Japanese | Ostrich facts / comedy; high accuracy |
+### Quality notes
+- **English**: Stable long-form narration.
+- **Japanese**: High accuracy on news and narrative (test.mp3, 日本語.mp3).
+- **Korean, Chinese, Arabic, Russian**: Consistent recognition on long content.
+- **Multilingual**: Many segments reported as [en] by the model while source language was correctly transcribed.
+- **Bengali**: Some mixed script/errors.
+### Reproduce
+```bash
+cd TranslateBluePackage
+WHISPERKIT_TEST_AUDIO_DIR=/path/to/input/audio \
+WHISPERKIT_TEST_LOG_DIR=/path/to/Log \
+swift test --filter WhisperKitAOIAndroidModelTests
+```
+(Use `WhisperKitConfig(model: "openai_whisper-large-v3-v20240930_turbo_632MB", modelRepo: "aoiandroid/whisperkit-coreml")` in your Swift code.)