Qwen3-ASR-CoreML / README.md
aufklarer's picture
Upload README.md with huggingface_hub
9f956dd verified
metadata
license: apache-2.0
language:
  - en
  - zh
  - ja
  - ko
  - fr
  - de
  - es
tags:
  - speech
  - asr
  - coreml
  - qwen3
  - neural-engine
base_model: Qwen/Qwen3-ASR-0.6B
pipeline_tag: automatic-speech-recognition

Qwen3-ASR-0.6B — CoreML

CoreML conversion of Qwen/Qwen3-ASR-0.6B for Apple Neural Engine.

Contains both audio encoder and text decoder for full Neural Engine inference (no GPU required).

Models

Model Description Quantization
encoder.mlmodelc Audio encoder (mel → embeddings) INT8 palettized
embedding.mlmodelc Token embedding lookup INT8 palettized
decoder.mlmodelc Text decoder with KV cache (28 layers) INT8 palettized
encoder_int4.mlpackage Audio encoder source INT4 palettized
encoder_int8.mlpackage Audio encoder source INT8 palettized

Usage

Full CoreML pipeline (encoder + decoder on Neural Engine):

Welcome to Swift!

Subcommands:

swift build Build Swift packages swift package Create and work on packages swift run Run a program from a package swift test Run package tests swift repl Experiment with Swift code interactively

Use swift --version for Swift version information.

Use swift --help for descriptions of available options and flags.

Use swift help <subcommand> for more information about a subcommand.

Hybrid mode (CoreML encoder + MLX decoder on GPU):

Architecture

  • Audio encoder: 18-layer Whisper-style transformer (896 dim, 14 heads)
  • Text decoder: 28 layers, 1024 hidden, 16 heads (8 KV heads)
  • KV cache: Fixed 1024 tokens via CoreML MLState
  • Requires: macOS 15+ / iOS 18+ (full CoreML mode)

Links