| # Model Card for Breeze-ASR-25 CoreML | |
| ## Model Details | |
| - **Model Name**: Breeze-ASR-25 CoreML | |
| - **Model Type**: Automatic Speech Recognition (ASR) | |
| - **Format**: CoreML (.mlmodelc) | |
| - **Base Model**: [MediaTek-Research/Breeze-ASR-25](https://huggingface.co/MediaTek-Research/Breeze-ASR-25) | |
| - **Developer**: MediaTek Research | |
| - **License**: Apache 2.0 | |
| ## Model Description | |
| Breeze-ASR-25 CoreML is a high-performance automatic speech recognition model optimized for Apple Silicon devices. The model has been converted from the original PyTorch format to CoreML format for efficient on-device inference using Whisperkit. | |
| ## Intended Use | |
| ### Primary Use Cases | |
| - Real-time speech-to-text transcription | |
| - On-device ASR applications | |
| - Mobile and desktop speech recognition | |
| - Privacy-preserving speech processing | |
| ### Target Users | |
| - iOS/macOS developers | |
| - Mobile app developers | |
| - Researchers in speech processing | |
| - Companies requiring on-device ASR | |
| ## Model Architecture | |
| The model consists of three main components: | |
| 1. **AudioEncoder**: Processes raw audio input and extracts features | |
| 2. **MelSpectrogram**: Converts audio to mel spectrogram representation | |
| 3. **TextDecoder**: Generates text transcription from audio features | |
| ## Performance | |
| ### Accuracy | |
| - High accuracy on various languages and accents | |
| - Optimized for conversational speech | |
| - Robust to background noise | |
| ### Efficiency | |
| - Optimized for Apple Silicon (M1/M2/M3) | |
| - Low memory footprint | |
| - Fast inference speed | |
| - On-device processing (no internet required) | |
| ## Training Data | |
| Based on the original Breeze-ASR-25 training data, which includes: | |
| - Large-scale multilingual speech datasets | |
| - Various acoustic conditions | |
| - Multiple languages and accents | |
| ## Limitations | |
| - Primarily optimized for Apple Silicon devices | |
| - Requires iOS 16.0+ or macOS 13.0+ | |
| - Performance may vary on older Apple devices | |
| - Limited to supported languages in the base model | |
| ## Ethical Considerations | |
| - The model should be used responsibly | |
| - Consider privacy implications of speech data | |
| - Ensure appropriate consent for audio recording | |
| - Be aware of potential biases in speech recognition | |
| ## Technical Specifications | |
| ### System Requirements | |
| - **Platform**: iOS 16.0+ or macOS 13.0+ | |
| - **Hardware**: Apple Silicon (M1/M2/M3) recommended | |
| - **Memory**: Minimum 4GB RAM | |
| - **Storage**: ~500MB for model files | |
| ### Model Files | |
| - `AudioEncoder.mlmodelc/` - Audio encoder model | |
| - `MelSpectrogram.mlmodelc/` - Mel spectrogram processor | |
| - `TextDecoder.mlmodelc/` - Text decoder model | |
| - `*.mlcomputeplan.json` - Compute plans for optimization | |
| ## Usage Examples | |
| ### Basic Usage | |
| ```python | |
| import whisperkit | |
| # Load model | |
| model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml") | |
| # Transcribe audio file | |
| result = model.transcribe("audio.wav") | |
| print(result.text) | |
| ``` | |
| ### Advanced Usage | |
| ```python | |
| # With custom parameters | |
| result = model.transcribe( | |
| "audio.wav", | |
| language="en", | |
| task="transcribe", | |
| temperature=0.0 | |
| ) | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @article{breeze-asr-25-coreml, | |
| title={Breeze-ASR-25 CoreML: On-Device Speech Recognition for Apple Silicon}, | |
| author={MediaTek Research}, | |
| journal={Hugging Face Model Hub}, | |
| year={2024} | |
| } | |
| ``` | |
| ## Contact | |
| For questions or issues related to this model, please contact MediaTek Research or create an issue in the model repository. | |