YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Language Identification Model for 32 Most-Spoken Languages

This repository contains resources for 32-language audio language identification (LID) based on Meta AI's MMS (Massively Multilingual Speech) models.

๐ŸŽฏ Quick Start โ€” Use the SOTA Model Directly

The best-performing model for this task is facebook/mms-lid-126, which achieves 97.2% accuracy on FLEURS across 126 languages and covers all 32 target languages.

from transformers import pipeline

classifier = pipeline("audio-classification", model="facebook/mms-lid-126")
result = classifier("/path/to/audio.wav")
print(result[0]["label"])  # ISO 639-3 code, e.g. "eng"

Or use the standalone inference script in this repo:

python inference.py --audio sample.wav

๐Ÿ“Š Supported Languages (ISO 639-3)

Code Language Code Language
eng English kor Korean
cmn Mandarin Chinese ita Italian
hin Hindi tha Thai
spa Spanish guj Gujarati
fra French fas Persian (Farsi)
ara Arabic pol Polish
ben Bengali ukr Ukrainian
por Portuguese mal Malayalam
rus Russian kan Kannada
urd Urdu ory Oriya
ind Indonesian mya Burmese
deu German pan Punjabi
jpn Japanese nld Dutch
mar Marathi pus Pashto
tel Telugu
tur Turkish
tam Tamil
vie Vietnamese

๐ŸŽฎ Live Demo

Try it instantly: denizaybey/lid-32-demo

๐Ÿ‹๏ธ Fine-Tuned Model

A fine-tuned version (denizaybey/lid-32-mms1b) is also available, trained specifically on the 32 target languages from the google/fleurs dataset using:

  • Base model: facebook/mms-1b (wav2vec2, 1B params)
  • Dataset: FLEURS (clean, professionally recorded audio)
  • Training: AdamW, lr=3e-5, effective batch=32, 5 epochs with early stopping

๐Ÿ“ Files

  • inference.py โ€” Standalone CLI for inference
  • train.py โ€” Full training script for reproduction
  • README.md โ€” This file

๐Ÿ”ฌ Technical Details

Attribute Value
Architecture wav2vec2 (MMS-1B)
Parameters ~1B
Sampling Rate 16 kHz
Max Audio Length 30 seconds
Output Format ISO 639-3 language codes
Training Data google/fleurs (32 language configs)

๐Ÿ“š References

  • Pratap et al., "Scaling Speech Technology to 1,000+ Languages", 2023 โ€” arXiv:2305.13516
  • Conneau et al., "FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech", 2022 โ€” arXiv:2205.12446
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Papers for denizaybey/lid-32-mms1b