Omnilingual ASR β CTC 300M (MLX 8-bit)
MLX-compatible 8-bit quantization of Meta's Omnilingual ASR CTC-300M model, targeting on-device inference on Apple Silicon (M1/M2/M3/M4). Prefer this variant when you need the smallest possible WER regression from fp32 and are willing to trade ~150 MB of extra disk compared to the 4-bit build.
Omnilingual ASR is a wav2vec 2.0βstyle encoder-only model with a linear CTC head, trained by Meta for speech recognition across 1,600+ languages. The CTC variant is language-agnostic at inference time.
Model
| Parameters | 326 M |
| Format | MLX safetensors (quantized linear layers + fp16 features) |
| Quantization | 8-bit per-group min-max, group size 64 |
| Sample rate | 16 kHz (raw waveform input) |
| Frame rate | 50 fps |
| Max duration | 40 s |
| Languages | 1,600+ |
| Vocabulary | 10,288 SentencePiece tokens |
Files
| File | Size | Description |
|---|---|---|
model.safetensors |
342 MB | 8-bit quantized transformer weights + fp16 conv frontend |
tokenizer.model |
1.2 MB | SentencePiece tokenizer |
config.json |
<1 KB | Architecture + quantization metadata |
Performance
See the 4-bit variant for FLEURS numbers β 8-bit should land within 0.2β0.5% absolute WER of fp32.
Architecture
Wav2Vec2FeatureExtractor (7-layer CNN, 320Γ downsample) β Linear 512β1024 β conv position encoder β 24Γ pre-norm Transformer encoder (dim 1024, 16 heads, ffn 4096) β LayerNorm β Linear CTC head (β 10,288 tokens).
Source
- Upstream model: facebook/omniASR-CTC-300M
- Paper: Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
- Meta blog: Omnilingual ASR announcement
Links
- speech-swift β Apple SDK
- soniqo.audio β website
- blog
License
Apache 2.0 (inherited from upstream).
- Guide: soniqo.audio/guides/omnilingual
- Docs: soniqo.audio
- GitHub: soniqo/speech-swift
- Downloads last month
- 40
Quantized
Model tree for aufklarer/Omnilingual-ASR-CTC-300M-MLX-8bit
Base model
facebook/omniASR-CTC-300M