Omnilingual ASR β€” CTC 7B (MLX 4-bit)

MLX-compatible 4-bit quantization of Meta's Omnilingual ASR CTC-7B model for on-device inference on Apple Silicon (M3 Pro / M4 Pro (16+ GB unified memory) recommended). Trades ~1 GB of extra disk versus CTC-3B 4-bit for measurably better accuracy on low-resource languages per Meta's published FLEURS results.

Omnilingual ASR is a wav2vec 2.0-style encoder-only model with a linear CTC head, trained by Meta for speech recognition across 1,600+ languages. The CTC variant is language-agnostic at inference time.

Model

Parameters ~7 B
Format MLX safetensors (quantized linear layers + fp16 features)
Quantization 4-bit per-group min-max, group size 64
Sample rate 16 kHz (raw waveform input)
Frame rate 50 fps
Max duration 40 s
Languages 1,600+
Vocabulary 10,288 SentencePiece tokens

Full architecture details (num_layers / model_dim / ffn_dim) are in config.json.

Files

File Description
model.safetensors 4-bit quantized transformer weights + fp16 conv frontend
tokenizer.model SentencePiece tokenizer
config.json Architecture + quantization metadata

Usage

import mlx.core as mx
from safetensors import safe_open

weights = {}
with safe_open("model.safetensors", framework="mlx") as f:
    for k in f.keys():
        weights[k] = f.get_tensor(k)

Swift inference is provided by speech-swift.

Source

Links

License

Apache 2.0 (inherited from upstream).


Downloads last month
52
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aufklarer/Omnilingual-ASR-CTC-7B-MLX-4bit

Finetuned
(2)
this model

Collection including aufklarer/Omnilingual-ASR-CTC-7B-MLX-4bit

Paper for aufklarer/Omnilingual-ASR-CTC-7B-MLX-4bit