MMS-LID 256 (ONNX)

ONNX export of MMS-LID (Massively Multilingual Speech - Language Identification) for 256 languages. For on-device or server inference without PyTorch.

Contents

  • ONNX model file(s) for the Wav2Vec2-based LID classifier
  • Label mapping (e.g. labels.json or mms_lid_id2label.json) for index to language code

Input / Output

  • Input: Raw waveform, 16 kHz mono, 10 seconds (160,000 samples)
  • Output: Logits over 256 language classes; argmax gives the predicted language index. Map index to ISO 639-3 code using the included labels file.

Usage

  1. Load the ONNX model with your runtime (e.g. ONNX Runtime, or convert further to Core ML for iOS).
  2. Feed 10 seconds of 16 kHz mono float32 audio.
  3. Take argmax of the logits output and look up the language code in the labels file.

Related repos

Languages Format Repo
256 ONNX this repo
126 Core ML mms-lid-126-coreml
256 Core ML mms-lid-256-coreml
512 Core ML mms-lid-512-coreml

Citation

@article{pratap2023mms,
  title={Scaling Speech Technology to 1,000+ Languages},
  author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
  journal={arXiv},
  year={2023}
}

License

CC-BY-NC-4.0 (inherited from MMS-LID).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support