Omnilingual ASR β€” CTC 300M (MLX 8-bit)

MLX-compatible 8-bit quantization of Meta's Omnilingual ASR CTC-300M model, targeting on-device inference on Apple Silicon (M1/M2/M3/M4). Prefer this variant when you need the smallest possible WER regression from fp32 and are willing to trade ~150 MB of extra disk compared to the 4-bit build.

Omnilingual ASR is a wav2vec 2.0–style encoder-only model with a linear CTC head, trained by Meta for speech recognition across 1,600+ languages. The CTC variant is language-agnostic at inference time.

Model

Parameters 326 M
Format MLX safetensors (quantized linear layers + fp16 features)
Quantization 8-bit per-group min-max, group size 64
Sample rate 16 kHz (raw waveform input)
Frame rate 50 fps
Max duration 40 s
Languages 1,600+
Vocabulary 10,288 SentencePiece tokens

Files

File Size Description
model.safetensors 342 MB 8-bit quantized transformer weights + fp16 conv frontend
tokenizer.model 1.2 MB SentencePiece tokenizer
config.json <1 KB Architecture + quantization metadata

Performance

See the 4-bit variant for FLEURS numbers β€” 8-bit should land within 0.2–0.5% absolute WER of fp32.

Architecture

Wav2Vec2FeatureExtractor (7-layer CNN, 320Γ— downsample) β†’ Linear 512β†’1024 β†’ conv position encoder β†’ 24Γ— pre-norm Transformer encoder (dim 1024, 16 heads, ffn 4096) β†’ LayerNorm β†’ Linear CTC head (β†’ 10,288 tokens).

Source

Links

License

Apache 2.0 (inherited from upstream).


Downloads last month
40
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aufklarer/Omnilingual-ASR-CTC-300M-MLX-8bit

Finetuned
(2)
this model

Collection including aufklarer/Omnilingual-ASR-CTC-300M-MLX-8bit

Paper for aufklarer/Omnilingual-ASR-CTC-300M-MLX-8bit