# Model Card for Breeze-ASR-25 CoreML ## Model Details - **Model Name**: Breeze-ASR-25 CoreML - **Model Type**: Automatic Speech Recognition (ASR) - **Format**: CoreML (.mlmodelc) - **Base Model**: [MediaTek-Research/Breeze-ASR-25](https://huggingface.co/MediaTek-Research/Breeze-ASR-25) - **Developer**: MediaTek Research - **License**: Apache 2.0 ## Model Description Breeze-ASR-25 CoreML is a high-performance automatic speech recognition model optimized for Apple Silicon devices. The model has been converted from the original PyTorch format to CoreML format for efficient on-device inference using Whisperkit. ## Intended Use ### Primary Use Cases - Real-time speech-to-text transcription - On-device ASR applications - Mobile and desktop speech recognition - Privacy-preserving speech processing ### Target Users - iOS/macOS developers - Mobile app developers - Researchers in speech processing - Companies requiring on-device ASR ## Model Architecture The model consists of three main components: 1. **AudioEncoder**: Processes raw audio input and extracts features 2. **MelSpectrogram**: Converts audio to mel spectrogram representation 3. **TextDecoder**: Generates text transcription from audio features ## Performance ### Accuracy - High accuracy on various languages and accents - Optimized for conversational speech - Robust to background noise ### Efficiency - Optimized for Apple Silicon (M1/M2/M3) - Low memory footprint - Fast inference speed - On-device processing (no internet required) ## Training Data Based on the original Breeze-ASR-25 training data, which includes: - Large-scale multilingual speech datasets - Various acoustic conditions - Multiple languages and accents ## Limitations - Primarily optimized for Apple Silicon devices - Requires iOS 16.0+ or macOS 13.0+ - Performance may vary on older Apple devices - Limited to supported languages in the base model ## Ethical Considerations - The model should be used responsibly - Consider privacy implications of speech data - Ensure appropriate consent for audio recording - Be aware of potential biases in speech recognition ## Technical Specifications ### System Requirements - **Platform**: iOS 16.0+ or macOS 13.0+ - **Hardware**: Apple Silicon (M1/M2/M3) recommended - **Memory**: Minimum 4GB RAM - **Storage**: ~500MB for model files ### Model Files - `AudioEncoder.mlmodelc/` - Audio encoder model - `MelSpectrogram.mlmodelc/` - Mel spectrogram processor - `TextDecoder.mlmodelc/` - Text decoder model - `*.mlcomputeplan.json` - Compute plans for optimization ## Usage Examples ### Basic Usage ```python import whisperkit # Load model model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml") # Transcribe audio file result = model.transcribe("audio.wav") print(result.text) ``` ### Advanced Usage ```python # With custom parameters result = model.transcribe( "audio.wav", language="en", task="transcribe", temperature=0.0 ) ``` ## Citation ```bibtex @article{breeze-asr-25-coreml, title={Breeze-ASR-25 CoreML: On-Device Speech Recognition for Apple Silicon}, author={MediaTek Research}, journal={Hugging Face Model Hub}, year={2024} } ``` ## Contact For questions or issues related to this model, please contact MediaTek Research or create an issue in the model repository.