Breeze-ASR-25_coreml / model_card.md
aoiandroid's picture
Upload folder using huggingface_hub
7a98249 verified
# Model Card for Breeze-ASR-25 CoreML
## Model Details
- **Model Name**: Breeze-ASR-25 CoreML
- **Model Type**: Automatic Speech Recognition (ASR)
- **Format**: CoreML (.mlmodelc)
- **Base Model**: [MediaTek-Research/Breeze-ASR-25](https://huggingface.co/MediaTek-Research/Breeze-ASR-25)
- **Developer**: MediaTek Research
- **License**: Apache 2.0
## Model Description
Breeze-ASR-25 CoreML is a high-performance automatic speech recognition model optimized for Apple Silicon devices. The model has been converted from the original PyTorch format to CoreML format for efficient on-device inference using Whisperkit.
## Intended Use
### Primary Use Cases
- Real-time speech-to-text transcription
- On-device ASR applications
- Mobile and desktop speech recognition
- Privacy-preserving speech processing
### Target Users
- iOS/macOS developers
- Mobile app developers
- Researchers in speech processing
- Companies requiring on-device ASR
## Model Architecture
The model consists of three main components:
1. **AudioEncoder**: Processes raw audio input and extracts features
2. **MelSpectrogram**: Converts audio to mel spectrogram representation
3. **TextDecoder**: Generates text transcription from audio features
## Performance
### Accuracy
- High accuracy on various languages and accents
- Optimized for conversational speech
- Robust to background noise
### Efficiency
- Optimized for Apple Silicon (M1/M2/M3)
- Low memory footprint
- Fast inference speed
- On-device processing (no internet required)
## Training Data
Based on the original Breeze-ASR-25 training data, which includes:
- Large-scale multilingual speech datasets
- Various acoustic conditions
- Multiple languages and accents
## Limitations
- Primarily optimized for Apple Silicon devices
- Requires iOS 16.0+ or macOS 13.0+
- Performance may vary on older Apple devices
- Limited to supported languages in the base model
## Ethical Considerations
- The model should be used responsibly
- Consider privacy implications of speech data
- Ensure appropriate consent for audio recording
- Be aware of potential biases in speech recognition
## Technical Specifications
### System Requirements
- **Platform**: iOS 16.0+ or macOS 13.0+
- **Hardware**: Apple Silicon (M1/M2/M3) recommended
- **Memory**: Minimum 4GB RAM
- **Storage**: ~500MB for model files
### Model Files
- `AudioEncoder.mlmodelc/` - Audio encoder model
- `MelSpectrogram.mlmodelc/` - Mel spectrogram processor
- `TextDecoder.mlmodelc/` - Text decoder model
- `*.mlcomputeplan.json` - Compute plans for optimization
## Usage Examples
### Basic Usage
```python
import whisperkit
# Load model
model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml")
# Transcribe audio file
result = model.transcribe("audio.wav")
print(result.text)
```
### Advanced Usage
```python
# With custom parameters
result = model.transcribe(
"audio.wav",
language="en",
task="transcribe",
temperature=0.0
)
```
## Citation
```bibtex
@article{breeze-asr-25-coreml,
title={Breeze-ASR-25 CoreML: On-Device Speech Recognition for Apple Silicon},
author={MediaTek Research},
journal={Hugging Face Model Hub},
year={2024}
}
```
## Contact
For questions or issues related to this model, please contact MediaTek Research or create an issue in the model repository.