aoiandroid
/

Breeze-ASR-25_coreml

Automatic Speech Recognition

Model card Files Files and versions

Breeze-ASR-25_coreml / model_card.md

aoiandroid's picture

Upload folder using huggingface_hub

7a98249 verified 4 months ago

|

history blame contribute delete

3.33 kB

	# Model Card for Breeze-ASR-25 CoreML

	## Model Details

	- Model Name: Breeze-ASR-25 CoreML
	- Model Type: Automatic Speech Recognition (ASR)
	- Format: CoreML (.mlmodelc)
	- Base Model: [MediaTek-Research/Breeze-ASR-25](https://huggingface.co/MediaTek-Research/Breeze-ASR-25)
	- Developer: MediaTek Research
	- License: Apache 2.0

	## Model Description

	Breeze-ASR-25 CoreML is a high-performance automatic speech recognition model optimized for Apple Silicon devices. The model has been converted from the original PyTorch format to CoreML format for efficient on-device inference using Whisperkit.

	## Intended Use

	### Primary Use Cases
	- Real-time speech-to-text transcription
	- On-device ASR applications
	- Mobile and desktop speech recognition
	- Privacy-preserving speech processing

	### Target Users
	- iOS/macOS developers
	- Mobile app developers
	- Researchers in speech processing
	- Companies requiring on-device ASR

	## Model Architecture

	The model consists of three main components:

	1. AudioEncoder: Processes raw audio input and extracts features
	2. MelSpectrogram: Converts audio to mel spectrogram representation
	3. TextDecoder: Generates text transcription from audio features

	## Performance

	### Accuracy
	- High accuracy on various languages and accents
	- Optimized for conversational speech
	- Robust to background noise

	### Efficiency
	- Optimized for Apple Silicon (M1/M2/M3)
	- Low memory footprint
	- Fast inference speed
	- On-device processing (no internet required)

	## Training Data

	Based on the original Breeze-ASR-25 training data, which includes:
	- Large-scale multilingual speech datasets
	- Various acoustic conditions
	- Multiple languages and accents

	## Limitations

	- Primarily optimized for Apple Silicon devices
	- Requires iOS 16.0+ or macOS 13.0+
	- Performance may vary on older Apple devices
	- Limited to supported languages in the base model

	## Ethical Considerations

	- The model should be used responsibly
	- Consider privacy implications of speech data
	- Ensure appropriate consent for audio recording
	- Be aware of potential biases in speech recognition

	## Technical Specifications

	### System Requirements
	- Platform: iOS 16.0+ or macOS 13.0+
	- Hardware: Apple Silicon (M1/M2/M3) recommended
	- Memory: Minimum 4GB RAM
	- Storage: ~500MB for model files

	### Model Files
	- `AudioEncoder.mlmodelc/` - Audio encoder model
	- `MelSpectrogram.mlmodelc/` - Mel spectrogram processor
	- `TextDecoder.mlmodelc/` - Text decoder model
	- `*.mlcomputeplan.json` - Compute plans for optimization

	## Usage Examples

	### Basic Usage
	```python
	import whisperkit

	# Load model
	model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml")

	# Transcribe audio file
	result = model.transcribe("audio.wav")
	print(result.text)
	```

	### Advanced Usage
	```python
	# With custom parameters
	result = model.transcribe(
	"audio.wav",
	language="en",
	task="transcribe",
	temperature=0.0
	)
	```

	## Citation

	```bibtex
	@article{breeze-asr-25-coreml,
	title={Breeze-ASR-25 CoreML: On-Device Speech Recognition for Apple Silicon},
	author={MediaTek Research},
	journal={Hugging Face Model Hub},
	year={2024}
	}
	```

	## Contact

	For questions or issues related to this model, please contact MediaTek Research or create an issue in the model repository.