Qwen3-ASR-0.6B — CoreML

CoreML conversion of Qwen/Qwen3-ASR-0.6B for Apple Neural Engine.

Contains both audio encoder and text decoder for full Neural Engine inference (no GPU required).

Models

Model	Description	Quantization
`encoder.mlmodelc`	Audio encoder (mel → embeddings)	INT8 palettized
`embedding.mlmodelc`	Token embedding lookup	INT8 palettized
`decoder.mlmodelc`	Text decoder with KV cache (28 layers)	INT8 palettized
`encoder_int4.mlpackage`	Audio encoder source	INT4 palettized
`encoder_int8.mlpackage`	Audio encoder source	INT8 palettized

Usage

Full CoreML pipeline (encoder + decoder on Neural Engine):

[1;38;5;196mWelcome to Swift![0m

[1mSubcommands:[0m

[1mswift build[0m Build Swift packages [1mswift package[0m Create and work on packages [1mswift run[0m Run a program from a package [1mswift test[0m Run package tests [1mswift repl[0m Experiment with Swift code interactively

Use [1mswift --version[0m for Swift version information.

Use [1mswift --help[0m for descriptions of available options and flags.

Use [1mswift help <subcommand>[0m for more information about a subcommand.

Hybrid mode (CoreML encoder + MLX decoder on GPU):

Architecture

Audio encoder: 18-layer Whisper-style transformer (896 dim, 14 heads)
Text decoder: 28 layers, 1024 hidden, 16 heads (8 KV heads)
KV cache: Fixed 1024 tokens via CoreML MLState
Requires: macOS 15+ / iOS 18+ (full CoreML mode)

Model tree for aufklarer/Qwen3-ASR-CoreML

Base model

Qwen/Qwen3-ASR-0.6B

Quantized

(34)

this model

Collection including aufklarer/Qwen3-ASR-CoreML

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 29 items • Updated 1 day ago • 4

aufklarer
/

Qwen3-ASR-CoreML