MMS-LID-256 β MLX
Meta's MMS-LID-256 language identification model running on MLX (Apple Silicon Metal GPU).
Identifies 256 languages from raw audio. Uses weights directly from facebook/mms-lid-256 β no conversion needed.
Performance (M1, 10s audio)
| Framework | Latency | Russian | English |
|---|---|---|---|
Python MLX (with mx.compile()) |
267ms | 98.8% | 99.8% |
Swift MLX (with compile()) |
268ms | 97.3% | 99.7% |
| CoreML GPU | 250ms | 89.1% | 99.8% |
Usage
# Clone the benchmark repo
git clone https://github.com/beshkenadze/lid-bench
cd lid-bench/mlx
# Setup
uv venv && uv pip install mlx numpy soundfile safetensors huggingface_hub
# Download weights (from original facebook repo)
huggingface-cli download facebook/mms-lid-256 --include "model.safetensors" "config.json"
# Run
python mms_lid_256.py path/to/audio.wav --benchmark
Full implementation: github.com/beshkenadze/lid-bench
Model Details
- Architecture: Wav2Vec2ForSequenceClassification (48 transformer layers, 16 heads)
- Input: Raw 16kHz waveform (zero-mean unit-variance normalized)
- Output: 256 language probabilities
- Parameters: 315M
- Weight format: Original HF safetensors (no conversion needed)
- Weight loading: Conv1d axis swap + weight_norm precomputation done at load time
Notes
- Weights are loaded directly from
facebook/mms-lid-256HF cache β no separate conversion step - This repo contains only the model card and MLX implementation reference
- ANE causes 13x slowdown β use Metal GPU (
.cpuAndGPU) only - Sustained inference on M1 degrades to ~400ms due to thermal throttling (48 transformer layers). M2+ should be better.
Model tree for beshkenadze/mms-lid-256-mlx
Base model
facebook/mms-lid-256