aoiandroid commited on
Commit
36ff36f
·
verified ·
1 Parent(s): bd030c2

docs: add evaluation results and environment to model card (openai_whisper-large-v3-v20240930_turbo_632MB)

Browse files
Files changed (1) hide show
  1. README.md +55 -2
README.md CHANGED
@@ -1,4 +1,3 @@
1
-
2
  ---
3
  pretty_name: "WhisperKit"
4
  viewer: false
@@ -11,13 +10,67 @@ tags:
11
  - quantized
12
  - automatic-speech-recognition
13
  ---
 
14
  # WhisperKit
15
 
16
  WhisperKit is an on-device speech recognition framework for Apple Silicon:
17
  https://github.com/argmaxinc/WhisperKit
18
 
19
  Check out the WhisperKit paper and presentation from ICML 2025:
20
- https://icml.cc/virtual/2025/47854
21
 
22
  For real-time streaming API, custom vocabulary, speaker diarization, and more, check out Argmax SDK: https://www.argmaxinc.com/blog/argmax-sdk-2
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  pretty_name: "WhisperKit"
3
  viewer: false
 
10
  - quantized
11
  - automatic-speech-recognition
12
  ---
13
+
14
  # WhisperKit
15
 
16
  WhisperKit is an on-device speech recognition framework for Apple Silicon:
17
  https://github.com/argmaxinc/WhisperKit
18
 
19
  Check out the WhisperKit paper and presentation from ICML 2025:
20
+ https://icml.cc/virtual/2025/47854
21
 
22
  For real-time streaming API, custom vocabulary, speaker diarization, and more, check out Argmax SDK: https://www.argmaxinc.com/blog/argmax-sdk-2
23
 
24
+ ---
25
+
26
+ ## Evaluation: openai_whisper-large-v3-v20240930_turbo_632MB
27
+
28
+ Transcription test results for the turbo 632MB model from this repo (aoiandroid/whisperkit-coreml).
29
+
30
+ ### Environment
31
+
32
+ | Item | Value |
33
+ |------|--------|
34
+ | Platform | macOS 14.x (arm64, Apple Silicon) |
35
+ | WhisperKit | [argmaxinc/WhisperKit](https://github.com/argmaxinc/WhisperKit) 0.15.0+ (Swift Package) |
36
+ | Model repo | aoiandroid/whisperkit-coreml |
37
+ | Test date | 2026-03-17 |
38
+ | Audio formats | m4a, mp3, wav, flac |
39
+
40
+ ### Test results (14 files, multi-language)
41
+
42
+ | File | Language / Content | Note |
43
+ |------|--------------------|------|
44
+ | English.mp3 | English | Texas travel narration (Gage Hotel, Padre Island, Corpus Christi, seafood); stable long-form transcription |
45
+ | Euskara.mp3 | Basque | Speech on language and identity |
46
+ | Guaraní.mp3 | Guarani | Short speech |
47
+ | Yorùbá.mp3 | Yoruba | Education and future |
48
+ | afrikaasns.mp3 | Afrikaans | Value of learning a new language |
49
+ | arabic.mp3 | Arabic | Speech on hope and future (full Arabic) |
50
+ | bengali.m4a | Bengali | Some mixed-language / recognition errors |
51
+ | chinese.mp3 | Chinese | Long explanation on smart traffic systems |
52
+ | isiZulu.mp3 | isiZulu | Future, education, youth |
53
+ | kiswahili.mp3 | Kiswahili | Unity (umoja) |
54
+ | korean.mp3 | Korean | "On challenge" (도전에 대하여) |
55
+ | russinan.m4a | Russian | Russia–Latin America parliamentary conference (with some English at end) |
56
+ | test.mp3 | Japanese | Typhoon 14 news; high accuracy |
57
+ | 日本語.mp3 | Japanese | Ostrich facts / comedy; high accuracy |
58
+
59
+ ### Quality notes
60
+
61
+ - **English**: Stable long-form narration.
62
+ - **Japanese**: High accuracy on news and narrative (test.mp3, 日本語.mp3).
63
+ - **Korean, Chinese, Arabic, Russian**: Consistent recognition on long content.
64
+ - **Multilingual**: Many segments reported as [en] by the model while source language was correctly transcribed.
65
+ - **Bengali**: Some mixed script/errors.
66
+
67
+ ### Reproduce
68
+
69
+ ```bash
70
+ cd TranslateBluePackage
71
+ WHISPERKIT_TEST_AUDIO_DIR=/path/to/input/audio \
72
+ WHISPERKIT_TEST_LOG_DIR=/path/to/Log \
73
+ swift test --filter WhisperKitAOIAndroidModelTests
74
+ ```
75
+
76
+ (Use `WhisperKitConfig(model: "openai_whisper-large-v3-v20240930_turbo_632MB", modelRepo: "aoiandroid/whisperkit-coreml")` in your Swift code.)