Omnilingual ASR — CTC 7B (MLX 4-bit)

MLX-compatible 4-bit quantization of Meta's Omnilingual ASR CTC-7B model for on-device inference on Apple Silicon (M3 Pro / M4 Pro (16+ GB unified memory) recommended). Trades ~1 GB of extra disk versus CTC-3B 4-bit for measurably better accuracy on low-resource languages per Meta's published FLEURS results.

Omnilingual ASR is a wav2vec 2.0-style encoder-only model with a linear CTC head, trained by Meta for speech recognition across 1,600+ languages. The CTC variant is language-agnostic at inference time.

Model


Parameters	~7 B
Format	MLX safetensors (quantized linear layers + fp16 features)
Quantization	4-bit per-group min-max, group size 64
Sample rate	16 kHz (raw waveform input)
Frame rate	50 fps
Max duration	40 s
Languages	1,600+
Vocabulary	10,288 SentencePiece tokens

Full architecture details (num_layers / model_dim / ffn_dim) are in config.json.

Files

File	Description
`model.safetensors`	4-bit quantized transformer weights + fp16 conv frontend
`tokenizer.model`	SentencePiece tokenizer
`config.json`	Architecture + quantization metadata

Usage

import mlx.core as mx
from safetensors import safe_open

weights = {}
with safe_open("model.safetensors", framework="mlx") as f:
    for k in f.keys():
        weights[k] = f.get_tensor(k)

Swift inference is provided by speech-swift.