Automatic Speech Recognition
indicnodeai meta-bot commited on
Commit
f055bc4
·
0 Parent(s):

Duplicate from facebook/omniASR-CTC-1B

Browse files

Co-authored-by: Meta Platforms <meta-bot@users.noreply.huggingface.co>

Files changed (4) hide show
  1. .gitattributes +35 -0
  2. README.md +139 -0
  3. omniASR-CTC-1B.pt +3 -0
  4. omniASR_tokenizer.model +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - facebook/omnilingual-asr-corpus
5
+ pipeline_tag: automatic-speech-recognition
6
+ ---
7
+
8
+ # Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
9
+
10
+ <div align="center" style="lline-height: 1.2; font-size:16px; margin-bottom: 30px;">
11
+ <a href="https://huggingface.co/facebook" target="_blank" style="margin: 2px;">
12
+ 🤗 Hugging Face
13
+ </a> |
14
+ <a href="https://github.com/facebookresearch/omnilingual-asr" target="_blank" style="margin: 2px;">
15
+ 🐙 GitHub
16
+ </a> |
17
+ <a href="https://huggingface.co/spaces/facebook/omniasr-transcriptions" target="_blank" style="margin: 2px;">
18
+ 🤖️ Demo
19
+ </a> |
20
+ <a href="https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/" target="_blank" style="margin: 2px;">
21
+ 📃 Paper
22
+ </a> |
23
+ <a href="https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/" target="_blank" style="margin: 2px;">
24
+ 📝 Blogpost
25
+ </a> |
26
+ <a href="https://github.com/facebookresearch/omnilingual-asr/blob/main/LICENSE" style="margin: 2px;">
27
+ 📄 License: Apache 2.0
28
+ </a>
29
+ </div>
30
+
31
+ # Model Card for omniASR-CTC-1B
32
+
33
+ ## Model Description
34
+
35
+ This model is part of the **Omnilingual ASR** family released by Meta AI. The original suite includes:
36
+
37
+ <!-- TODO : add new tokenizer, we'll get two tokenizer, add mssing speed numbers-->
38
+ | Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
39
+ |---------------------|---------------|------------:|---------------:|---------------:|-----------:|
40
+ | [`omniASR_W2V_300M`](https://huggingface.co/facebook/omniASR-W2V-300M) | SSL | 317_390_592 | 1.2 GiB | | |
41
+ | [`omniASR_W2V_1B`](https://huggingface.co/facebook/omniASR-W2V-1B) | SSL | 965_514_752 | 3.6 GiB | | |
42
+ | [`omniASR_W2V_3B`](https://huggingface.co/facebook/omniASR-W2V-3B) | SSL | 3_064_124_672 | 12.0 GiB | | |
43
+ | [`omniASR_W2V_7B`](https://huggingface.co/facebook/omniASR-W2V-7B) | SSL | 6_488_487_168 | 25.0 GiB | | |
44
+ | [`omniASR_CTC_300M`](https://huggingface.co/facebook/omniASR-CTC-300M) | ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
45
+ | [`omniASR_CTC_1B`](https://huggingface.co/facebook/omniASR-CTC-1B) | ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
46
+ | [`omniASR_CTC_3B`](https://huggingface.co/facebook/omniASR-CTC-3B) | ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
47
+ | [`omniASR_CTC_7B`](https://huggingface.co/facebook/omniASR-CTC-7B) | ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
48
+ | [`omniASR_LLM_300M`](https://huggingface.co/facebook/omniASR-LLM-300M) | ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
49
+ | [`omniASR_LLM_1B`](https://huggingface.co/facebook/omniASR-LLM-1B) | ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
50
+ | [`omniASR_LLM_3B`](https://huggingface.co/facebook/omniASR-LLM-3B) | ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
51
+ | [`omniASR_LLM_7B`](https://huggingface.co/facebook/omniASR-LLM-7B) | ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
52
+ | [`omniASR_LLM_7B_ZS`](https://huggingface.co/facebook/omniASR-LLM-7B-ZS) | Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
53
+
54
+
55
+ ¹ (batch=1, audio_len=30s, BF16, A100)
56
+
57
+ ² Relative speed to `omniASR_LLM_7B`
58
+
59
+ ---
60
+
61
+ ## Installation
62
+
63
+ The models were developed using [fairseq2](https://github.com/facebookresearch/fairseq2), a research-focused sequence modeling toolkit. While we provide a **reference** inference pipeline that works across platforms, audio support requires [libsndfile](https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#system-dependencies) (Mac: `brew install libsndfile`; Windows may need an additional [setup](https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows)).
64
+
65
+ ```bash
66
+ # using pip
67
+ pip install omnilingual-asr
68
+
69
+ # using uv
70
+ uv add omnilingual-asr
71
+ ```
72
+
73
+
74
+ ## Inference
75
+
76
+ ```python
77
+ from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
78
+
79
+ pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
80
+
81
+ audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
82
+ lang = ["eng_Latn", "deu_Latn"]
83
+ transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
84
+ ```
85
+
86
+ ## Supported Languages
87
+
88
+ To view the full list of 1600+ supported languages, you can access the language list [programmatically](/src/omnilingual_asr/models/wav2vec2_llama/lang_ids.py):
89
+
90
+ ```python
91
+ from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
92
+
93
+ # Print all supported languages
94
+ print(f"Total supported languages: {len(supported_langs)}")
95
+ print(supported_langs)
96
+
97
+ # Check if a specific language is supported
98
+ if "eng_Latn" in supported_langs:
99
+ print("English (Latin script) is supported!")
100
+ ```
101
+
102
+ Languages follow the format `{language_code}_{script}`, for example `eng_Latn` - English (Latin script), `cmn_Hans` - Mandarin Chinese (Simplified), ...
103
+
104
+ ---
105
+
106
+ ## Training
107
+
108
+ To further finetune the released checkpoints on your own data, use our [data preparation guide](/workflows/dataprep/README.md) followed by the [finetuning recipe guide](/workflows/recipes/wav2vec2/asr/README.md).
109
+
110
+ ---
111
+
112
+ ## Citation
113
+
114
+ **BibTeX:**
115
+
116
+ ```bibtex
117
+ @misc{omnilingualasr2025,
118
+ title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
119
+ author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Can, Balioglu and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
120
+ year={2025},
121
+ url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
122
+ }
123
+ ```
124
+
125
+ * **Developed by:** Meta AI / Omnilingual ASR Team([GitHub][1])
126
+ * **Model type:** End-to-end automatic speech recognition model (wav2vec2-style encoder with CTC head / encoder-decoder, depending on checkpoint).
127
+ * **Language(s) (NLP):** 1,600+ languages overall in Omnilingual ASR; this corpus release specifically covers **348 under-served languages** across many writing systems (Latin, Arabic, Devanagari, etc.).([GitHub][1])
128
+ * **License:** Apache-2.0 (for the model and code), CC-BY-4.0 for the `facebook/omnilingual-asr-corpus` dataset.([GitHub][1])
129
+
130
+ ---
131
+
132
+ [1]: https://github.com/facebookresearch/omnilingual-asr?tab=readme-ov-file "GitHub - facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages"
133
+ [2]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus/blob/main/README.md "README.md · facebook/omnilingual-asr-corpus at main"
134
+ [3]: https://huggingface.co/datasets/facebook/omnilingual-asr-corpus?utm_source=chatgpt.com "facebook/omnilingual-asr-corpus · Datasets at ..."
135
+ [4]: https://venturebeat.com/ai/meta-returns-to-open-source-ai-with-omnilingual-asr-models-that-can?utm_source=chatgpt.com "Meta returns to open source AI with Omnilingual ASR ..."
136
+ [5]: https://huggingface.co/spaces/facebook/omniasr-transcriptions?utm_source=chatgpt.com "Omnilingual ASR Media Transcription"
137
+ [6]: https://huggingface.co/collections/bezzam/omnilingual-asr-1-600-languages?utm_source=chatgpt.com "Omnilingual ASR (1600+ Languages) - a bezzam Collection"
138
+
139
+
omniASR-CTC-1B.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8564fa59dab7caedbcdb54ab7fb9bd6c96989f4d19add2ad81ddd969716952c
3
+ size 3900517028
omniASR_tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b954cc166b0c9e0271b953fa226fa27ca706a25b7029e84579fe2c60a2b451fe
3
+ size 87562