MMS-LID-256 β€” MLX

Meta's MMS-LID-256 language identification model running on MLX (Apple Silicon Metal GPU).

Identifies 256 languages from raw audio. Uses weights directly from facebook/mms-lid-256 β€” no conversion needed.

Performance (M1, 10s audio)

Framework Latency Russian English
Python MLX (with mx.compile()) 267ms 98.8% 99.8%
Swift MLX (with compile()) 268ms 97.3% 99.7%
CoreML GPU 250ms 89.1% 99.8%

Usage

# Clone the benchmark repo
git clone https://github.com/beshkenadze/lid-bench
cd lid-bench/mlx

# Setup
uv venv && uv pip install mlx numpy soundfile safetensors huggingface_hub

# Download weights (from original facebook repo)
huggingface-cli download facebook/mms-lid-256 --include "model.safetensors" "config.json"

# Run
python mms_lid_256.py path/to/audio.wav --benchmark

Full implementation: github.com/beshkenadze/lid-bench

Model Details

  • Architecture: Wav2Vec2ForSequenceClassification (48 transformer layers, 16 heads)
  • Input: Raw 16kHz waveform (zero-mean unit-variance normalized)
  • Output: 256 language probabilities
  • Parameters: 315M
  • Weight format: Original HF safetensors (no conversion needed)
  • Weight loading: Conv1d axis swap + weight_norm precomputation done at load time

Notes

  • Weights are loaded directly from facebook/mms-lid-256 HF cache β€” no separate conversion step
  • This repo contains only the model card and MLX implementation reference
  • ANE causes 13x slowdown β€” use Metal GPU (.cpuAndGPU) only
  • Sustained inference on M1 degrades to ~400ms due to thermal throttling (48 transformer layers). M2+ should be better.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for beshkenadze/mms-lid-256-mlx

Finetuned
(11)
this model