kotoba-whisper-v2.2 CoreML
This is the CoreML conversion of kotoba-tech/kotoba-whisper-v2.2 for use with WhisperKit.
Model Details
- Base Model: kotoba-tech/kotoba-whisper-v2.2
- Language: Japanese (ja)
- Format: CoreML (.mlmodelc)
- Optimized for: Apple Silicon (ANE - Apple Neural Engine)
Included Files
| File | Description | ANE Support |
|---|---|---|
AudioEncoder.mlmodelc |
Audio feature encoder | 100% |
TextDecoder.mlmodelc |
Text decoder | 98% |
MelSpectrogram.mlmodelc |
Mel spectrogram converter | 72% |
Usage with WhisperKit
import WhisperKit
let whisperKit = try await WhisperKit(
modelFolder: "path/to/kotoba-tech_kotoba-whisper-v2.2"
)
let result = try await whisperKit.transcribe(
audioPath: "path/to/audio.wav",
language: "ja"
)
Notes
- This is a distilled model with only 2 decoder layers (vs 32 in the original Whisper large model)
- Token-level timestamps are disabled due to alignment heads configuration incompatibility with the distilled architecture
License
This model is released under the Apache License 2.0, following the original model's license.
Attribution
This is a derivative work based on:
- kotoba-tech/kotoba-whisper-v2.2 - The original Japanese Whisper model by Kotoba Technologies
- OpenAI Whisper - The base Whisper architecture
- Distil-Whisper - Distillation codebase
- ReazonSpeech - Japanese speech dataset
Acknowledgments
- kotoba-tech for the original model
- argmaxinc for WhisperKit and whisperkittools
- Downloads last month
- 46
Model tree for yslinear/kotoba-whisper-v2.2-coreml
Base model
kotoba-tech/kotoba-whisper-v2.2