Omnilingual ASR — CTC 300M (MLX 8-bit)

MLX-compatible 8-bit quantization of Meta's Omnilingual ASR CTC-300M model, targeting on-device inference on Apple Silicon (M1/M2/M3/M4). Prefer this variant when you need the smallest possible WER regression from fp32 and are willing to trade ~150 MB of extra disk compared to the 4-bit build.

Omnilingual ASR is a wav2vec 2.0–style encoder-only model with a linear CTC head, trained by Meta for speech recognition across 1,600+ languages. The CTC variant is language-agnostic at inference time.

Model


Parameters	326 M
Format	MLX safetensors (quantized linear layers + fp16 features)
Quantization	8-bit per-group min-max, group size 64
Sample rate	16 kHz (raw waveform input)
Frame rate	50 fps
Max duration	40 s
Languages	1,600+
Vocabulary	10,288 SentencePiece tokens

Files

File	Size	Description
`model.safetensors`	342 MB	8-bit quantized transformer weights + fp16 conv frontend
`tokenizer.model`	1.2 MB	SentencePiece tokenizer
`config.json`	<1 KB	Architecture + quantization metadata

Performance

See the 4-bit variant for FLEURS numbers — 8-bit should land within 0.2–0.5% absolute WER of fp32.

Architecture

Wav2Vec2FeatureExtractor (7-layer CNN, 320× downsample) → Linear 512→1024 → conv position encoder → 24× pre-norm Transformer encoder (dim 1024, 16 heads, ffn 4096) → LayerNorm → Linear CTC head (→ 10,288 tokens).