MMS-LID-256 — MLX

Meta's MMS-LID-256 language identification model running on MLX (Apple Silicon Metal GPU).

Identifies 256 languages from raw audio. Uses weights directly from facebook/mms-lid-256 — no conversion needed.

Performance (M1, 10s audio)

Framework	Latency	Russian	English
Python MLX (with `mx.compile()`)	267ms	98.8%	99.8%
Swift MLX (with `compile()`)	268ms	97.3%	99.7%
CoreML GPU	250ms	89.1%	99.8%

Usage

# Clone the benchmark repo
git clone https://github.com/beshkenadze/lid-bench
cd lid-bench/mlx

# Setup
uv venv && uv pip install mlx numpy soundfile safetensors huggingface_hub

# Download weights (from original facebook repo)
huggingface-cli download facebook/mms-lid-256 --include "model.safetensors" "config.json"

# Run
python mms_lid_256.py path/to/audio.wav --benchmark

Full implementation: github.com/beshkenadze/lid-bench

Model Details

Architecture: Wav2Vec2ForSequenceClassification (48 transformer layers, 16 heads)
Input: Raw 16kHz waveform (zero-mean unit-variance normalized)
Output: 256 language probabilities
Parameters: 315M
Weight format: Original HF safetensors (no conversion needed)
Weight loading: Conv1d axis swap + weight_norm precomputation done at load time

Notes

Weights are loaded directly from facebook/mms-lid-256 HF cache — no separate conversion step
This repo contains only the model card and MLX implementation reference
ANE causes 13x slowdown — use Metal GPU (.cpuAndGPU) only
Sustained inference on M1 degrades to ~400ms due to thermal throttling (48 transformer layers). M2+ should be better.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for beshkenadze/mms-lid-256-mlx

Base model

facebook/mms-lid-256

Finetuned

(12)

this model