Qwen3-ASR Technical Report
Paper
•
2601.21337
•
Published
•
33
Core ML conversion of Qwen/Qwen3-ASR-0.6B for on-device speech recognition on Apple platforms (iOS/macOS).
| Variant | Size | Description |
|---|---|---|
f32/ |
~2.5 GB | Full precision (Float32) - highest accuracy |
int8/ |
~0.7 GB | Quantized (Int8) - smaller, faster |
| Dataset | WER | CER | RTFx |
|---|---|---|---|
| LibriSpeech test-clean (2620 files) | 4.4% | 1.9% | 2.8x |
| AISHELL-1 test (100 files) | 4.6% | 3.7% | 4.5x |
Official PyTorch model: 2.11% WER on LibriSpeech test-clean
import FluidAudio
let manager = Qwen3AsrManager()
try await manager.loadModels()
let samples = try AudioConverter().resampleAudioFile(path: "audio.wav")
let transcript = try await manager.transcribe(
audioSamples: samples,
language: "en",
maxNewTokens: 512
)
print(transcript)
Model Architecture
License
Apache 2.0 - Same as the original Qwen3-ASR model.
Credits
Citation
@article{qwen3asr, title={Qwen3-ASR Technical Report}, author={Qwen Team}, journal={arXiv preprint arXiv:2601.21337}, year={2025} }
For the HuggingFace metadata UI, fill in:
Base model
Qwen/Qwen3-ASR-0.6B