Automatic Speech Recognition
Core ML
hebrew
qwen3-asr
apple-silicon
on-device

Caspi-1.7B CoreML

CoreML conversion of OzLabs/Caspi-1.7B for on-device Hebrew speech recognition on Apple Silicon (macOS/iOS).

Caspi is a Hebrew-optimized fine-tune of Qwen/Qwen3-ASR-1.7B, achieving ~5% WER on Hebrew benchmarks.

Model Details

Property Value
Base model OzLabs/Caspi-1.7B (fine-tuned from Qwen/Qwen3-ASR-1.7B)
Architecture Qwen3-ASR (audio encoder + LLM decoder)
Parameters 2B
Quantization Int8 (linear quantization via coremltools)
Primary language Hebrew
License CC-BY-NC-4.0 (inherited from OzLabs/Caspi-1.7B)

Files

qwen3_asr_audio_encoder_v2.mlmodelc/   # Audio encoder (606 MB)
qwen3_asr_decoder_stateful.mlmodelc/   # Fused decoder + LM head with KV-cache (1.6 GB)
qwen3_asr_embeddings.bin               # Token embeddings (151936 x 2048, float16, 594 MB)
vocab.json                             # Vocabulary (151643 tokens)

Architecture

Component Details
Audio encoder 24 layers, d_model=1024, 16 heads, output_dim=2048
Text decoder 28 layers, hidden_size=2048, 16 query heads, 8 KV heads, head_dim=128
Vocabulary 151,936 tokens (SentencePiece)
Decoding Autoregressive with stateful KV-cache (max 512 tokens)

Performance (Apple Silicon)

Tested on M5 Pro, 48GB:

Metric Value
RTFx (release build) ~2.15x (faster than real-time)
Peak memory ~6.3 GB
Model size on disk ~2.8 GB

Usage with FluidAudio

These models are designed for use with FluidAudio's Qwen3AsrManager:

import FluidAudio

let manager = Qwen3AsrManager()
try await manager.loadModels(from: modelDirectory)

let samples = try AudioConverter().resampleAudioFile(audioURL)
let text = try await manager.transcribe(audioSamples: samples, language: "he")
print(text)

Note: FluidAudio's Qwen3AsrConfig must be updated for 1.7B dimensions (hidden_size=2048, etc.). See alandotcom/FluidAudio caspi-1.7b branch for the required config changes.

Usage with Hex

A fork of Hex (macOS dictation app) with Caspi support is available at alandotcom/Hex caspi-hebrew branch.

To use:

  1. Download the model files to ~/Library/Application Support/FluidAudio/Models/caspi-1.7b-coreml/
  2. Clone and build the Hex fork
  3. Select "Caspi 1.7B (Hebrew)" in Settings, set language to Hebrew

Conversion

Conversion scripts are available at alandotcom/caspi-hebrew-asr, forked from FluidInference/mobius with dimensions updated for the 1.7B architecture.

To reproduce:

git clone https://github.com/alandotcom/caspi-hebrew-asr.git
cd caspi-hebrew-asr/conversion
uv sync
uv run python convert-qwen3-asr.py                    # full f32 conversion
uv run python convert_decoder_fused.py                 # fused stateful decoder
uv run python extract_embeddings.py                    # embeddings + vocab
uv run python quantize_model.py input.mlpackage output.mlpackage --dtype int8  # quantize

Conversion pipeline:

  1. Audio encoder: traced with coremltools, FP16 precision
  2. Decoder: fused stateful decoder with LM head baked in, FLOAT32 compute precision (required to avoid float16 overflow in RMSNorm)
  3. Embeddings: extracted as raw float16 binary
  4. Post-training int8 weight quantization applied to decoder

License

This model inherits the CC-BY-NC-4.0 license from OzLabs/Caspi-1.7B.

The base model Qwen/Qwen3-ASR-1.7B is Apache-2.0 licensed.

The conversion scripts are from FluidInference/mobius (Apache-2.0).

Citations

Caspi

@misc{caspi_hebrew_asr,
  title={Caspi-1.7B: Hebrew ASR fine-tuned from Qwen3-ASR-1.7B},
  author={Oz Labs},
  year={2026},
  howpublished={Hugging Face model card}
}

Qwen3-ASR

@article{qwen3asr,
  title={Qwen3-ASR Technical Report},
  author={Qwen Team},
  journal={arXiv preprint arXiv:2601.21337},
  year={2025}
}
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alandotcom/caspi-1.7b-coreml

Quantized
(1)
this model

Datasets used to train alandotcom/caspi-1.7b-coreml

Paper for alandotcom/caspi-1.7b-coreml