Qwen3-ASR 0.6B CoreML

Core ML conversion of Qwen/Qwen3-ASR-0.6B for on-device speech recognition on Apple platforms (iOS/macOS).

Model Variants

Variant Size Description
f32/ ~2.5 GB Full precision (Float32) - highest accuracy
int8/ ~0.7 GB Quantized (Int8) - smaller, faster

Features

  • 30+ languages including English, Chinese, Japanese, Korean, and more
  • On-device inference - no internet required
  • Autoregressive decoder with KV-cache support
  • Processes audio in 1-second chunks (100 mel frames)

Benchmarks (M4 Pro)

Dataset WER CER RTFx
LibriSpeech test-clean (2620 files) 4.4% 1.9% 2.8x
AISHELL-1 test (100 files) 4.6% 3.7% 4.5x

Official PyTorch model: 2.11% WER on LibriSpeech test-clean

Usage with FluidAudio

import FluidAudio

let manager = Qwen3AsrManager()
try await manager.loadModels()

let samples = try AudioConverter().resampleAudioFile(path: "audio.wav")
let transcript = try await manager.transcribe(
    audioSamples: samples,
    language: "en",
    maxNewTokens: 512
)
print(transcript)

Model Architecture

  • Encoder: Audio encoder (Whisper-style mel spectrogram input)
  • Decoder: 28-layer transformer decoder with 1024 hidden size
  • Tokenizer: Qwen tokenizer with special ASR tokens

License

Apache 2.0 - Same as the original Qwen3-ASR model.

Credits

Citation

@article{qwen3asr, title={Qwen3-ASR Technical Report}, author={Qwen Team}, journal={arXiv preprint arXiv:2601.21337}, year={2025} }

For the HuggingFace metadata UI, fill in:

  • License: Apache 2.0
  • Base model: Qwen/Qwen3-ASR-0.6B
  • Pipeline: automatic-speech-recognition
  • Library: coreml
  • Languages: en, zh, ja, ko, + others
Downloads last month
184
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FluidInference/qwen3-asr-0.6b-coreml

Quantized
(1)
this model

Paper for FluidInference/qwen3-asr-0.6b-coreml