|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- automatic-speech-recognition |
|
|
- coreml |
|
|
- whisperkit |
|
|
- apple-silicon |
|
|
- asr |
|
|
- on-device |
|
|
- breeze |
|
|
- mediatek |
|
|
model_type: automatic-speech-recognition |
|
|
library_name: whisperkit |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
# Breeze-ASR-25 CoreML |
|
|
|
|
|
This model is based on [MediaTek-Research_Breeze-ASR-25](https://huggingface.co/MediaTek-Research/Breeze-ASR-25), a state-of-the-art automatic speech recognition (ASR) model. |
|
|
It has been converted into the CoreML format for compatibility with Whisperkit, enabling efficient ASR inference on Apple Silicon devices. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
Breeze-ASR-25 is a high-performance automatic speech recognition model developed by MediaTek Research. This CoreML version enables on-device inference on Apple Silicon devices through Whisperkit integration. |
|
|
|
|
|
## Model Components |
|
|
|
|
|
This repository contains three CoreML models: |
|
|
|
|
|
1. **AudioEncoder.mlmodelc** - Audio feature encoder |
|
|
2. **MelSpectrogram.mlmodelc** - Mel spectrogram processor |
|
|
3. **TextDecoder.mlmodelc** - Text decoder for transcription |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With Whisperkit |
|
|
|
|
|
```python |
|
|
import whisperkit |
|
|
|
|
|
# Load the model |
|
|
model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml") |
|
|
|
|
|
# Transcribe audio |
|
|
result = model.transcribe("path/to/audio.wav") |
|
|
print(result.text) |
|
|
``` |
|
|
|
|
|
### Requirements |
|
|
|
|
|
- macOS with Apple Silicon (M1/M2/M3) |
|
|
- iOS 16.0+ or macOS 13.0+ |
|
|
- Whisperkit framework |
|
|
|
|
|
## Performance |
|
|
|
|
|
- Optimized for Apple Silicon devices |
|
|
- On-device inference (no internet required) |
|
|
- Low latency and memory usage |
|
|
- High accuracy speech recognition |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the Apache 2.0 License. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original Breeze-ASR-25 paper: |
|
|
|
|
|
```bibtex |
|
|
@article{breeze-asr-25, |
|
|
title={Breeze-ASR-25: Efficient Speech Recognition for Mobile Devices}, |
|
|
author={MediaTek Research}, |
|
|
journal={arXiv preprint}, |
|
|
year={2024} |
|
|
} |
|
|
``` |