kotoba-whisper-v2.2 CoreML

This is the CoreML conversion of kotoba-tech/kotoba-whisper-v2.2 for use with WhisperKit.

Model Details

Base Model: kotoba-tech/kotoba-whisper-v2.2
Language: Japanese (ja)
Format: CoreML (.mlmodelc)
Optimized for: Apple Silicon (ANE - Apple Neural Engine)

Included Files

File	Description	ANE Support
`AudioEncoder.mlmodelc`	Audio feature encoder	100%
`TextDecoder.mlmodelc`	Text decoder	98%
`MelSpectrogram.mlmodelc`	Mel spectrogram converter	72%

Usage with WhisperKit

import WhisperKit

let whisperKit = try await WhisperKit(
    modelFolder: "path/to/kotoba-tech_kotoba-whisper-v2.2"
)

let result = try await whisperKit.transcribe(
    audioPath: "path/to/audio.wav",
    language: "ja"
)

Notes

This is a distilled model with only 2 decoder layers (vs 32 in the original Whisper large model)
Token-level timestamps are disabled due to alignment heads configuration incompatibility with the distilled architecture

License

This model is released under the Apache License 2.0, following the original model's license.

Attribution

This is a derivative work based on:

kotoba-tech/kotoba-whisper-v2.2 - The original Japanese Whisper model by Kotoba Technologies
OpenAI Whisper - The base Whisper architecture
Distil-Whisper - Distillation codebase
ReazonSpeech - Japanese speech dataset

Acknowledgments

kotoba-tech for the original model
argmaxinc for WhisperKit and whisperkittools

Downloads last month: 691

Model tree for yslinear/kotoba-whisper-v2.2-coreml

Base model

kotoba-tech/kotoba-whisper-v2.2

Finetuned

(8)

this model