elmadany commited on
Commit
f9cc242
Β·
verified Β·
1 Parent(s): 2f34d11

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -102
README.md CHANGED
@@ -1,102 +1,116 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- base_model: facebook/mms-1b-all
4
- tags:
5
- - automatic-speech-recognition
6
- - /mnt/home/elmadany/elmadany_workspace/African_speechT5/jasmine-raid/elmadany_work/HF_format/SimbaBench_tasks_ft
7
- - mms
8
- - generated_from_trainer
9
- metrics:
10
- - wer
11
- model-index:
12
- - name: mms-1b-all
13
- results: []
14
- ---
15
-
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
-
19
- # mms-1b-all
20
-
21
- This model is a fine-tuned version of [facebook/mms-1b-all](https://huggingface.co/facebook/mms-1b-all) on the /MNT/HOME/ELMADANY/ELMADANY_WORKSPACE/AFRICAN_SPEECHT5/JASMINE-RAID/ELMADANY_WORK/HF_FORMAT/SIMBABENCH_TASKS_FT - ASR_FT_DATA_BATCH_2 dataset.
22
- It achieves the following results on the evaluation set:
23
- - Loss: 0.2214
24
- - Wer: 0.2988
25
-
26
- ## Model description
27
-
28
- More information needed
29
-
30
- ## Intended uses & limitations
31
-
32
- More information needed
33
-
34
- ## Training and evaluation data
35
-
36
- More information needed
37
-
38
- ## Training procedure
39
-
40
- ### Training hyperparameters
41
-
42
- The following hyperparameters were used during training:
43
- - learning_rate: 0.001
44
- - train_batch_size: 4
45
- - eval_batch_size: 2
46
- - seed: 42
47
- - distributed_type: multi-GPU
48
- - num_devices: 4
49
- - gradient_accumulation_steps: 4
50
- - total_train_batch_size: 64
51
- - total_eval_batch_size: 8
52
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
- - lr_scheduler_type: linear
54
- - lr_scheduler_warmup_steps: 1000
55
- - num_epochs: 30.0
56
-
57
- ### Training results
58
-
59
- | Training Loss | Epoch | Step | Validation Loss | Wer |
60
- |:-------------:|:-----:|:-----:|:---------------:|:------:|
61
- | 6.7354 | 1.0 | 610 | 0.5166 | 0.5255 |
62
- | 0.7031 | 2.0 | 1221 | 0.4340 | 0.4949 |
63
- | 0.6258 | 3.0 | 1831 | 0.3713 | 0.4372 |
64
- | 0.5844 | 4.0 | 2442 | 0.3527 | 0.4214 |
65
- | 0.5615 | 5.0 | 3053 | 0.3429 | 0.4238 |
66
- | 0.5829 | 2.0 | 4884 | 0.3453 | 0.4082 |
67
- | 0.5516 | 3.0 | 7327 | 0.3249 | 0.3986 |
68
- | 0.5313 | 4.0 | 9769 | 0.3149 | 0.3898 |
69
- | 0.5141 | 5.0 | 12212 | 0.3006 | 0.3754 |
70
- | 0.4991 | 6.0 | 14654 | 0.2959 | 0.3730 |
71
- | 0.4916 | 7.0 | 17097 | 0.2914 | 0.3667 |
72
- | 0.48 | 8.0 | 19539 | 0.2791 | 0.3528 |
73
- | 0.4708 | 9.0 | 21982 | 0.2808 | 0.3624 |
74
- | 0.4659 | 10.0 | 24424 | 0.2730 | 0.3518 |
75
- | 0.4587 | 11.0 | 26867 | 0.2708 | 0.3511 |
76
- | 0.4541 | 12.0 | 29309 | 0.2630 | 0.3371 |
77
- | 0.4442 | 13.0 | 31752 | 0.2606 | 0.3394 |
78
- | 0.439 | 14.0 | 34194 | 0.2592 | 0.3384 |
79
- | 0.4334 | 15.0 | 36637 | 0.2499 | 0.3300 |
80
- | 0.4304 | 16.0 | 39079 | 0.2492 | 0.3279 |
81
- | 0.4247 | 17.0 | 41522 | 0.2433 | 0.3255 |
82
- | 0.4197 | 18.0 | 43964 | 0.2407 | 0.3214 |
83
- | 0.4139 | 19.0 | 46407 | 0.2419 | 0.3172 |
84
- | 0.4091 | 20.0 | 48849 | 0.2378 | 0.3190 |
85
- | 0.4064 | 21.0 | 51292 | 0.2345 | 0.3150 |
86
- | 0.4026 | 22.0 | 53734 | 0.2319 | 0.3141 |
87
- | 0.4015 | 23.0 | 56177 | 0.2299 | 0.3070 |
88
- | 0.3941 | 24.0 | 58619 | 0.2278 | 0.3064 |
89
- | 0.3913 | 25.0 | 61062 | 0.2250 | 0.3044 |
90
- | 0.3902 | 26.0 | 63504 | 0.2270 | 0.3061 |
91
- | 0.3847 | 27.0 | 65947 | 0.2237 | 0.3025 |
92
- | 0.3826 | 28.0 | 68389 | 0.2214 | 0.2988 |
93
- | 0.3797 | 29.0 | 70832 | 0.2213 | 0.3001 |
94
- | 0.3786 | 29.99 | 73260 | 0.2208 | 0.2989 |
95
-
96
-
97
- ### Framework versions
98
-
99
- - Transformers 4.33.2
100
- - Pytorch 2.0.1+cu117
101
- - Datasets 3.5.1
102
- - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
4
+
5
+
6
+ [![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
7
+ [![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
8
+ [![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](#simbabench)
9
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=black&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
10
+ [![YouTube Video](https://img.shields.io/badge/YouTube-Video-FF0000?style=for-the-badge&logo=youtube&logoColor=FF0000&labelColor=FFCCBC)](#demo)
11
+
12
+ </div>
13
+
14
+ ## *Bridging the Digital Divide for African AI*
15
+
16
+ **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
17
+
18
+ ## Best-in-Class Multilingual Models
19
+
20
+ Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
21
+
22
+ - **Unified Suite:** Models optimized for African languages.
23
+ - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
24
+ - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
25
+ - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
26
+
27
+ The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
28
+
29
+ ### πŸ—£οΈβœοΈ Simba-ASR
30
+ > **The New Standard for African Speech-to-Text**
31
+
32
+ **🎯 Task** `Automatic Speech Recognition` β€” Powering high-accuracy transcription across the continent.
33
+
34
+ **🌍 Language Coverage (43 African languages)**
35
+ > **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **BaoulΓ©** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
36
+
37
+ **πŸ—οΈ Base Architectures**
38
+
39
+ - **Simba-S** (SeamlessM4T-v2-MT) β€” *Top Performer*
40
+ - **Simba-W** (Whisper-v3-large)
41
+ - **Simba-X** (Wav2Vec2-XLS-R-2b)
42
+ - **Simba-M** (MMS-1b-all)
43
+ - **Simba-H** (AfriHuBERT)
44
+
45
+ | **ASR Models** | **Architecture** | **πŸ€— Hugging Face Model Card** | **Status** |
46
+ |---------|:------------------:| :------------------:| :------------------:|
47
+ | πŸ”₯**Simba-S**πŸ”₯| SeamlessM4T-v2 | πŸ€— [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | βœ… Released |
48
+ | πŸ”₯**Simba-W**πŸ”₯| Whisper | πŸ€— [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | βœ… Released |
49
+ | πŸ”₯**Simba-X**πŸ”₯| Wav2Vec2 | πŸ€— [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | βœ… Released |
50
+ | πŸ”₯**Simba-M**πŸ”₯| MMS | πŸ€— [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | βœ… Released |
51
+ | πŸ”₯**Simba-H**πŸ”₯| HuBERT | πŸ€— [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | βœ… Released |
52
+
53
+ * **Simba-S** (based on SeamlessM4T-v2-MT) emerged as the best-performing ASR model overall.
54
+
55
+ **🧩 Usage Example**
56
+
57
+ You can easily run inference using the Hugging Face `transformers` library.
58
+
59
+ ```python
60
+ from transformers import pipeline
61
+
62
+ # Load Simba-S for ASR
63
+ asr_pipeline = pipeline(
64
+ "automatic-speech-recognition",
65
+ model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
66
+ )
67
+
68
+ asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
69
+
70
+ # Transcribe audio from file
71
+ result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
72
+ print(result["text"])
73
+
74
+
75
+ # Transcribe audio from audio array
76
+ result = asr_pipeline({
77
+ "array": audio_array,
78
+ "sampling_rate": 16_000
79
+ })
80
+ print(result["text"])
81
+
82
+ ```
83
+ Get started with Simba models in minutes using our interactive Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
84
+
85
+
86
+ ## Citation
87
+
88
+ If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
89
+
90
+ ```bibtex
91
+
92
+ @inproceedings{elmadany-etal-2025-voice,
93
+ title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
94
+ author = "Elmadany, AbdelRahim A. and
95
+ Kwon, Sang Yun and
96
+ Toyin, Hawau Olamide and
97
+ Alcoba Inciarte, Alcides and
98
+ Aldarmaki, Hanan and
99
+ Abdul-Mageed, Muhammad",
100
+ editor = "Christodoulopoulos, Christos and
101
+ Chakraborty, Tanmoy and
102
+ Rose, Carolyn and
103
+ Peng, Violet",
104
+ booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
105
+ month = nov,
106
+ year = "2025",
107
+ address = "Suzhou, China",
108
+ publisher = "Association for Computational Linguistics",
109
+ url = "https://aclanthology.org/2025.emnlp-main.559/",
110
+ doi = "10.18653/v1/2025.emnlp-main.559",
111
+ pages = "11039--11061",
112
+ ISBN = "979-8-89176-332-6",
113
+ }
114
+
115
+ ```
116
+