Breeze-ASR-25_coreml / model_card.md
aoiandroid's picture
Upload folder using huggingface_hub
7a98249 verified

Model Card for Breeze-ASR-25 CoreML

Model Details

  • Model Name: Breeze-ASR-25 CoreML
  • Model Type: Automatic Speech Recognition (ASR)
  • Format: CoreML (.mlmodelc)
  • Base Model: MediaTek-Research/Breeze-ASR-25
  • Developer: MediaTek Research
  • License: Apache 2.0

Model Description

Breeze-ASR-25 CoreML is a high-performance automatic speech recognition model optimized for Apple Silicon devices. The model has been converted from the original PyTorch format to CoreML format for efficient on-device inference using Whisperkit.

Intended Use

Primary Use Cases

  • Real-time speech-to-text transcription
  • On-device ASR applications
  • Mobile and desktop speech recognition
  • Privacy-preserving speech processing

Target Users

  • iOS/macOS developers
  • Mobile app developers
  • Researchers in speech processing
  • Companies requiring on-device ASR

Model Architecture

The model consists of three main components:

  1. AudioEncoder: Processes raw audio input and extracts features
  2. MelSpectrogram: Converts audio to mel spectrogram representation
  3. TextDecoder: Generates text transcription from audio features

Performance

Accuracy

  • High accuracy on various languages and accents
  • Optimized for conversational speech
  • Robust to background noise

Efficiency

  • Optimized for Apple Silicon (M1/M2/M3)
  • Low memory footprint
  • Fast inference speed
  • On-device processing (no internet required)

Training Data

Based on the original Breeze-ASR-25 training data, which includes:

  • Large-scale multilingual speech datasets
  • Various acoustic conditions
  • Multiple languages and accents

Limitations

  • Primarily optimized for Apple Silicon devices
  • Requires iOS 16.0+ or macOS 13.0+
  • Performance may vary on older Apple devices
  • Limited to supported languages in the base model

Ethical Considerations

  • The model should be used responsibly
  • Consider privacy implications of speech data
  • Ensure appropriate consent for audio recording
  • Be aware of potential biases in speech recognition

Technical Specifications

System Requirements

  • Platform: iOS 16.0+ or macOS 13.0+
  • Hardware: Apple Silicon (M1/M2/M3) recommended
  • Memory: Minimum 4GB RAM
  • Storage: ~500MB for model files

Model Files

  • AudioEncoder.mlmodelc/ - Audio encoder model
  • MelSpectrogram.mlmodelc/ - Mel spectrogram processor
  • TextDecoder.mlmodelc/ - Text decoder model
  • *.mlcomputeplan.json - Compute plans for optimization

Usage Examples

Basic Usage

import whisperkit

# Load model
model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml")

# Transcribe audio file
result = model.transcribe("audio.wav")
print(result.text)

Advanced Usage

# With custom parameters
result = model.transcribe(
    "audio.wav",
    language="en",
    task="transcribe",
    temperature=0.0
)

Citation

@article{breeze-asr-25-coreml,
  title={Breeze-ASR-25 CoreML: On-Device Speech Recognition for Apple Silicon},
  author={MediaTek Research},
  journal={Hugging Face Model Hub},
  year={2024}
}

Contact

For questions or issues related to this model, please contact MediaTek Research or create an issue in the model repository.