alexwengg's picture
Update README.md
ee0604f verified
metadata
license: apache-2.0
language:
  - en
  - zh
  - ja
  - ko
  - vi
  - th
  - id
  - ms
  - hi
  - ar
  - tr
  - ru
  - de
  - fr
  - es
  - multilingual
tags:
  - speech-recognition
  - asr
  - coreml
  - apple
  - ios
  - macos
  - qwen
  - audio
library_name: coreml
pipeline_tag: automatic-speech-recognition
base_model: Qwen/Qwen3-ASR-0.6B

Qwen3-ASR 0.6B CoreML

Core ML conversion of Qwen/Qwen3-ASR-0.6B for on-device speech recognition on Apple platforms (iOS/macOS).

Model Variants

Variant Size Description
f32/ ~2.5 GB Full precision (Float32) - highest accuracy
int8/ ~0.7 GB Quantized (Int8) - smaller, faster

Features

  • 30+ languages including English, Chinese, Japanese, Korean, and more
  • On-device inference - no internet required
  • Autoregressive decoder with KV-cache support
  • Processes audio in 1-second chunks (100 mel frames)

Benchmarks (M4 Pro)

Dataset WER CER RTFx
LibriSpeech test-clean (2620 files) 4.4% 1.9% 2.8x
AISHELL-1 test (100 files) 4.6% 3.7% 4.5x

Official PyTorch model: 2.11% WER on LibriSpeech test-clean

Usage with FluidAudio

import FluidAudio

let manager = Qwen3AsrManager()
try await manager.loadModels()

let samples = try AudioConverter().resampleAudioFile(path: "audio.wav")
let transcript = try await manager.transcribe(
    audioSamples: samples,
    language: "en",
    maxNewTokens: 512
)
print(transcript)

Model Architecture

  • Encoder: Audio encoder (Whisper-style mel spectrogram input)
  • Decoder: 28-layer transformer decoder with 1024 hidden size
  • Tokenizer: Qwen tokenizer with special ASR tokens

License

Apache 2.0 - Same as the original Qwen3-ASR model.

Credits

Citation

@article{qwen3asr, title={Qwen3-ASR Technical Report}, author={Qwen Team}, journal={arXiv preprint arXiv:2601.21337}, year={2025} }

For the HuggingFace metadata UI, fill in:

  • License: Apache 2.0
  • Base model: Qwen/Qwen3-ASR-0.6B
  • Pipeline: automatic-speech-recognition
  • Library: coreml
  • Languages: en, zh, ja, ko, + others