Scaling Speech Technology to 1,000+ Languages
Paper โข 2305.13516 โข Published โข 12
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This repository contains resources for 32-language audio language identification (LID) based on Meta AI's MMS (Massively Multilingual Speech) models.
The best-performing model for this task is facebook/mms-lid-126, which achieves 97.2% accuracy on FLEURS across 126 languages and covers all 32 target languages.
from transformers import pipeline
classifier = pipeline("audio-classification", model="facebook/mms-lid-126")
result = classifier("/path/to/audio.wav")
print(result[0]["label"]) # ISO 639-3 code, e.g. "eng"
Or use the standalone inference script in this repo:
python inference.py --audio sample.wav
| Code | Language | Code | Language |
|---|---|---|---|
eng |
English | kor |
Korean |
cmn |
Mandarin Chinese | ita |
Italian |
hin |
Hindi | tha |
Thai |
spa |
Spanish | guj |
Gujarati |
fra |
French | fas |
Persian (Farsi) |
ara |
Arabic | pol |
Polish |
ben |
Bengali | ukr |
Ukrainian |
por |
Portuguese | mal |
Malayalam |
rus |
Russian | kan |
Kannada |
urd |
Urdu | ory |
Oriya |
ind |
Indonesian | mya |
Burmese |
deu |
German | pan |
Punjabi |
jpn |
Japanese | nld |
Dutch |
mar |
Marathi | pus |
Pashto |
tel |
Telugu | ||
tur |
Turkish | ||
tam |
Tamil | ||
vie |
Vietnamese |
Try it instantly: denizaybey/lid-32-demo
A fine-tuned version (denizaybey/lid-32-mms1b) is also available, trained specifically on the 32 target languages from the google/fleurs dataset using:
facebook/mms-1b (wav2vec2, 1B params)inference.py โ Standalone CLI for inferencetrain.py โ Full training script for reproductionREADME.md โ This file| Attribute | Value |
|---|---|
| Architecture | wav2vec2 (MMS-1B) |
| Parameters | ~1B |
| Sampling Rate | 16 kHz |
| Max Audio Length | 30 seconds |
| Output Format | ISO 639-3 language codes |
| Training Data | google/fleurs (32 language configs) |