Model Card for Breeze-ASR-25 CoreML
Model Details
- Model Name: Breeze-ASR-25 CoreML
- Model Type: Automatic Speech Recognition (ASR)
- Format: CoreML (.mlmodelc)
- Base Model: MediaTek-Research/Breeze-ASR-25
- Developer: MediaTek Research
- License: Apache 2.0
Model Description
Breeze-ASR-25 CoreML is a high-performance automatic speech recognition model optimized for Apple Silicon devices. The model has been converted from the original PyTorch format to CoreML format for efficient on-device inference using Whisperkit.
Intended Use
Primary Use Cases
- Real-time speech-to-text transcription
- On-device ASR applications
- Mobile and desktop speech recognition
- Privacy-preserving speech processing
Target Users
- iOS/macOS developers
- Mobile app developers
- Researchers in speech processing
- Companies requiring on-device ASR
Model Architecture
The model consists of three main components:
- AudioEncoder: Processes raw audio input and extracts features
- MelSpectrogram: Converts audio to mel spectrogram representation
- TextDecoder: Generates text transcription from audio features
Performance
Accuracy
- High accuracy on various languages and accents
- Optimized for conversational speech
- Robust to background noise
Efficiency
- Optimized for Apple Silicon (M1/M2/M3)
- Low memory footprint
- Fast inference speed
- On-device processing (no internet required)
Training Data
Based on the original Breeze-ASR-25 training data, which includes:
- Large-scale multilingual speech datasets
- Various acoustic conditions
- Multiple languages and accents
Limitations
- Primarily optimized for Apple Silicon devices
- Requires iOS 16.0+ or macOS 13.0+
- Performance may vary on older Apple devices
- Limited to supported languages in the base model
Ethical Considerations
- The model should be used responsibly
- Consider privacy implications of speech data
- Ensure appropriate consent for audio recording
- Be aware of potential biases in speech recognition
Technical Specifications
System Requirements
- Platform: iOS 16.0+ or macOS 13.0+
- Hardware: Apple Silicon (M1/M2/M3) recommended
- Memory: Minimum 4GB RAM
- Storage: ~500MB for model files
Model Files
AudioEncoder.mlmodelc/- Audio encoder modelMelSpectrogram.mlmodelc/- Mel spectrogram processorTextDecoder.mlmodelc/- Text decoder model*.mlcomputeplan.json- Compute plans for optimization
Usage Examples
Basic Usage
import whisperkit
# Load model
model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml")
# Transcribe audio file
result = model.transcribe("audio.wav")
print(result.text)
Advanced Usage
# With custom parameters
result = model.transcribe(
"audio.wav",
language="en",
task="transcribe",
temperature=0.0
)
Citation
@article{breeze-asr-25-coreml,
title={Breeze-ASR-25 CoreML: On-Device Speech Recognition for Apple Silicon},
author={MediaTek Research},
journal={Hugging Face Model Hub},
year={2024}
}
Contact
For questions or issues related to this model, please contact MediaTek Research or create an issue in the model repository.