Model Card for Breeze-ASR-25 CoreML

Model Details

Model Name: Breeze-ASR-25 CoreML
Model Type: Automatic Speech Recognition (ASR)
Format: CoreML (.mlmodelc)
Base Model: MediaTek-Research/Breeze-ASR-25
Developer: MediaTek Research
License: Apache 2.0

Model Description

Breeze-ASR-25 CoreML is a high-performance automatic speech recognition model optimized for Apple Silicon devices. The model has been converted from the original PyTorch format to CoreML format for efficient on-device inference using Whisperkit.

Intended Use

Primary Use Cases

Real-time speech-to-text transcription
On-device ASR applications
Mobile and desktop speech recognition
Privacy-preserving speech processing

Target Users

iOS/macOS developers
Mobile app developers
Researchers in speech processing
Companies requiring on-device ASR

Model Architecture

The model consists of three main components:

AudioEncoder: Processes raw audio input and extracts features
MelSpectrogram: Converts audio to mel spectrogram representation
TextDecoder: Generates text transcription from audio features

Performance

Accuracy

High accuracy on various languages and accents
Optimized for conversational speech
Robust to background noise

Efficiency

Optimized for Apple Silicon (M1/M2/M3)
Low memory footprint
Fast inference speed
On-device processing (no internet required)

Training Data

Based on the original Breeze-ASR-25 training data, which includes:

Large-scale multilingual speech datasets
Various acoustic conditions
Multiple languages and accents

Limitations

Primarily optimized for Apple Silicon devices
Requires iOS 16.0+ or macOS 13.0+
Performance may vary on older Apple devices
Limited to supported languages in the base model

Ethical Considerations

The model should be used responsibly
Consider privacy implications of speech data
Ensure appropriate consent for audio recording
Be aware of potential biases in speech recognition

Technical Specifications

System Requirements

Platform: iOS 16.0+ or macOS 13.0+
Hardware: Apple Silicon (M1/M2/M3) recommended
Memory: Minimum 4GB RAM
Storage: ~500MB for model files

Model Files

AudioEncoder.mlmodelc/ - Audio encoder model
MelSpectrogram.mlmodelc/ - Mel spectrogram processor
TextDecoder.mlmodelc/ - Text decoder model
*.mlcomputeplan.json - Compute plans for optimization

Usage Examples

Basic Usage

import whisperkit

# Load model
model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml")

# Transcribe audio file
result = model.transcribe("audio.wav")
print(result.text)

Advanced Usage

# With custom parameters
result = model.transcribe(
    "audio.wav",
    language="en",
    task="transcribe",
    temperature=0.0
)

Citation

@article{breeze-asr-25-coreml,
  title={Breeze-ASR-25 CoreML: On-Device Speech Recognition for Apple Silicon},
  author={MediaTek Research},
  journal={Hugging Face Model Hub},
  year={2024}
}

Contact

For questions or issues related to this model, please contact MediaTek Research or create an issue in the model repository.