qwen
Collection
8 items β’ Updated
CoreML conversion of Qwen/Qwen3-ASR-1.7B for on-device speech-to-text on macOS/iOS.
Optimized with FP32 compute + ANEMLL RMSNorm + mixed-precision INT8 to prevent decoder overflow while keeping model size small.
| File | Description | Size |
|---|---|---|
qwen3_asr_encoder_int8.mlpackage/ |
Audio encoder (Conv2D + 24-layer Transformer + projection). FP16 compute, INT8 per-channel weights. | 304 MB |
qwen3_asr_decoder_f32_anemll_int8-mixed.mlpackage/ |
Decoder (28-layer causal LLM + LM head). FP32 compute, ANEMLL RMSNorm, mixed-precision INT8. | 2.8 GB |
qwen3_asr_embeddings.bin |
Token embedding table (FP16, shape 151936 x 2048). Loaded separately to avoid quantizing embeddings. | 594 MB |
Audio (16kHz) β Mel Spectrogram (128 bins)
β Encoder (100-frame chunks β 13 tokens each)
β Embedding lookup + prompt construction
β Decoder (autoregressive, RoPE as inputs, KV cache)
β Token IDs β Text
[1, 1, 128, 100] mel chunk β Output: [1, 13, 2048] audio featuresLayerNorm([x, -x]) trick for numerical stabilitylayer_norm op via [x, -x] concatenation for GPU/ANE precision.| Pipeline | Transcription | Match |
|---|---|---|
| PyTorch FP32 | "And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country." | Reference |
| CoreML INT8 | Same text (minor punctuation difference: "." vs ";") | MATCH |
| Metric | Value |
|---|---|
| PyTorch vs CoreML match rate | 19/20 (95%) |
| Average CER (PyTorch) | 0.2009 |
| Average CER (CoreML INT8) | 0.2064 |
| CER difference | +0.0056 |
-infConverted using coremltools 9.0 with PyTorch 2.10.0. Conversion scripts available at the source repository.
compute_precision: FLOAT32 (decoder), FLOAT16 (encoder)
minimum_deployment_target: macOS14
quantization: linear_symmetric INT8 per-channel (mixed-precision for decoder)
Apache 2.0 (same as base model)
Base model
Qwen/Qwen3-ASR-1.7B