metadata
license: apache-2.0
language:
- en
- zh
- ja
- ko
- vi
- th
- id
- ms
- hi
- ar
- tr
- ru
- de
- fr
- es
- multilingual
tags:
- speech-recognition
- asr
- coreml
- apple
- ios
- macos
- qwen
- audio
library_name: coreml
pipeline_tag: automatic-speech-recognition
base_model: Qwen/Qwen3-ASR-0.6B
Qwen3-ASR 0.6B CoreML
Core ML conversion of Qwen/Qwen3-ASR-0.6B for on-device speech recognition on Apple platforms (iOS/macOS).
Model Variants
| Variant | Size | Description |
|---|---|---|
f32/ |
~2.5 GB | Full precision (Float32) - highest accuracy |
int8/ |
~0.7 GB | Quantized (Int8) - smaller, faster |
Features
- 30+ languages including English, Chinese, Japanese, Korean, and more
- On-device inference - no internet required
- Autoregressive decoder with KV-cache support
- Processes audio in 1-second chunks (100 mel frames)
Benchmarks (M4 Pro)
| Dataset | WER | CER | RTFx |
|---|---|---|---|
| LibriSpeech test-clean (2620 files) | 4.4% | 1.9% | 2.8x |
| AISHELL-1 test (100 files) | 4.6% | 3.7% | 4.5x |
Official PyTorch model: 2.11% WER on LibriSpeech test-clean
Usage with FluidAudio
import FluidAudio
let manager = Qwen3AsrManager()
try await manager.loadModels()
let samples = try AudioConverter().resampleAudioFile(path: "audio.wav")
let transcript = try await manager.transcribe(
audioSamples: samples,
language: "en",
maxNewTokens: 512
)
print(transcript)
Model Architecture
- Encoder: Audio encoder (Whisper-style mel spectrogram input)
- Decoder: 28-layer transformer decoder with 1024 hidden size
- Tokenizer: Qwen tokenizer with special ASR tokens
License
Apache 2.0 - Same as the original Qwen3-ASR model.
Credits
- Original model: https://huggingface.co/Qwen/Qwen3-ASR-0.6B by Alibaba Qwen Team
- Paper: https://arxiv.org/abs/2601.21337
- CoreML conversion: https://github.com/FluidInference/FluidAudio
Citation
@article{qwen3asr, title={Qwen3-ASR Technical Report}, author={Qwen Team}, journal={arXiv preprint arXiv:2601.21337}, year={2025} }
For the HuggingFace metadata UI, fill in:
- License: Apache 2.0
- Base model: Qwen/Qwen3-ASR-0.6B
- Pipeline: automatic-speech-recognition
- Library: coreml
- Languages: en, zh, ja, ko, + others