Language Identification Model for 32 Most-Spoken Languages

This repository contains resources for 32-language audio language identification (LID) based on Meta AI's MMS (Massively Multilingual Speech) models.

🎯 Quick Start — Use the SOTA Model Directly

The best-performing model for this task is facebook/mms-lid-126, which achieves 97.2% accuracy on FLEURS across 126 languages and covers all 32 target languages.

from transformers import pipeline

classifier = pipeline("audio-classification", model="facebook/mms-lid-126")
result = classifier("/path/to/audio.wav")
print(result[0]["label"])  # ISO 639-3 code, e.g. "eng"

Or use the standalone inference script in this repo:

python inference.py --audio sample.wav

📊 Supported Languages (ISO 639-3)

Code	Language	Code	Language
`eng`	English	`kor`	Korean
`cmn`	Mandarin Chinese	`ita`	Italian
`hin`	Hindi	`tha`	Thai
`spa`	Spanish	`guj`	Gujarati
`fra`	French	`fas`	Persian (Farsi)
`ara`	Arabic	`pol`	Polish
`ben`	Bengali	`ukr`	Ukrainian
`por`	Portuguese	`mal`	Malayalam
`rus`	Russian	`kan`	Kannada
`urd`	Urdu	`ory`	Oriya
`ind`	Indonesian	`mya`	Burmese
`deu`	German	`pan`	Punjabi
`jpn`	Japanese	`nld`	Dutch
`mar`	Marathi	`pus`	Pashto
`tel`	Telugu
`tur`	Turkish
`tam`	Tamil
`vie`	Vietnamese

🎮 Live Demo

Try it instantly: denizaybey/lid-32-demo

🏋️ Fine-Tuned Model

A fine-tuned version (denizaybey/lid-32-mms1b) is also available, trained specifically on the 32 target languages from the google/fleurs dataset using:

Base model: facebook/mms-1b (wav2vec2, 1B params)
Dataset: FLEURS (clean, professionally recorded audio)
Training: AdamW, lr=3e-5, effective batch=32, 5 epochs with early stopping

📁 Files

inference.py — Standalone CLI for inference
train.py — Full training script for reproduction
README.md — This file

🔬 Technical Details

Attribute	Value
Architecture	wav2vec2 (MMS-1B)
Parameters	~1B
Sampling Rate	16 kHz
Max Audio Length	30 seconds
Output Format	ISO 639-3 language codes
Training Data	google/fleurs (32 language configs)

📚 References

Pratap et al., "Scaling Speech Technology to 1,000+ Languages", 2023 — arXiv:2305.13516
Conneau et al., "FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech", 2022 — arXiv:2205.12446

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for denizaybey/lid-32-mms1b

Scaling Speech Technology to 1,000+ Languages

Paper • 2305.13516 • Published May 22, 2023 • 12

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

Paper • 2205.12446 • Published May 25, 2022 • 2