elmadany commited on
Commit
4701d9d
·
verified ·
1 Parent(s): 44f4147

Initial model upload

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ak # Akuapim Twi
4
+ - tw # Asante Twi
5
+ - aeb # Tunisian Arabic
6
+ - af # Afrikaans
7
+ - am # Amharic
8
+ - ar # Arabic
9
+ - bas # Basaa
10
+ - bem # Bemba
11
+ - dav # Taita
12
+ - dyu # Dyula
13
+ - en # English
14
+ - pcm # Nigerian Pidgin
15
+ - ee # Ewe
16
+ - fat # Fanti
17
+ - fon # Fon
18
+ - fuc # Pulaar
19
+ - ff # Pular
20
+ - gaa # Ga
21
+ - ha # Hausa
22
+ - ig # Igbo
23
+ - kab # Kabyle
24
+ - rw # Kinyarwanda
25
+ - kln # Kalenjin
26
+ - ln # Lingala
27
+ - loz # Lozi
28
+ - lg # Luganda
29
+ - luo # Luo
30
+ - mlq # Western Maninkakan
31
+ - nr # South Ndebele
32
+ - nso # Northern Sotho
33
+ - ny # Chichewa
34
+ - st # Southern Sotho
35
+ - srr # Serer
36
+ - ss # Swati
37
+ - sus # Susu
38
+ - sw # Kiswahili/Swahili
39
+ - tig # Tigre
40
+ - ti # Tigrinya
41
+ - toi # Tonga
42
+ - tn # Tswana
43
+ - ts # Tsonga
44
+ - tw # Twi
45
+ - ve # Venda
46
+ - wo # Wolof
47
+ - xh # Xhosa
48
+ - yo # Yoruba
49
+ - zgh # Standard Moroccan Tamazight
50
+ - zu # Zulu
51
+
52
+ license: cc-by-4.0
53
+ tags:
54
+ - automatic-speech-recognition
55
+ - audio
56
+ - speech
57
+ - african-languages
58
+ - multilingual
59
+ - simba
60
+ - low-resource
61
+ - speech-recognition
62
+ - asr
63
+ - spoken-language-identification
64
+ - language-identification
65
+ datasets:
66
+ - UBC-NLP/SimbaBench
67
+ metrics:
68
+ - wer
69
+ - cer
70
+ - accuracy
71
+ library_name: transformers
72
+ pipeline_tag: automatic-speech-recognition
73
+ ---
74
+
75
+ <div align="center">
76
+
77
+ <img src="https://africa.dlnlp.ai/simba/images/VoC_logo.png" alt="VoC Logo">
78
+
79
+ [![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
80
+ [![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
81
+ [![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
82
+ [![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=github&logoColor=181717&labelColor=E0E0E0)](https://github.com/UBC-NLP/simba)
83
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
84
+ [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset)
85
+
86
+ </div>
87
+
88
+ ## *Bridging the Digital Divide for African AI*
89
+
90
+ **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
91
+
92
+ ## Best-in-Class Multilingual Models
93
+
94
+ <img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
95
+
96
+ Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
97
+
98
+ - **Unified Suite:** Models optimized for African languages.
99
+ - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
100
+ - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
101
+ - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
102
+
103
+ The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
104
+
105
+
106
+ ### 🔍 Simba-SLID (Spoken Language Identification)
107
+ * **🎯 Task:** `Spoken Language Identification` — Intelligent input routing.
108
+ * **🌍 Language Coverage (49 African languages)**
109
+ > **Akuapim Twi** (`Akuapim-twi`), **Asante Twi** (`Asante-twi`), **Tunisian Arabic** (`aeb`), **Afrikaans** (`afr`), **Amharic** (`amh`), **Arabic** (`ara`), **Basaa** (`bas`), **Bemba** (`bem`), **Taita** (`dav`), **Dyula** (`dyu`), **English** (`eng`), **Nigerian Pidgin** (`eng-zul`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **Pulaar** (`fuc`), **Pular** (`fuf`), **Ga** (`gaa`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabyle** (`kab`), **Kinyarwanda** (`kin`), **Kalenjin** (`kln`), **Lingala** (`lin`), **Lozi** (`loz`), **Luganda** (`lug`), **Luo** (`luo`), **Western Maninkakan** (`mlq`), **South Ndebele** (`nbl`), **Northern Sotho** (`nso`), **Chichewa** (`nya`), **Southern Sotho** (`sot`), **Serer** (`srr`), **Swati** (`ssw`), **Susu** (`sus`), **Kiswahili** (`swa`), **Swahili** (`swh`), **Tigre** (`tig`), **Tigrinya** (`tir`), **Tonga** (`toi`), **Tswana** (`tsn`), **Tsonga** (`tso`), **Twi** (`twi`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Standard Moroccan Tamazight** (`zgh`), **Zulu** (`zul`)
110
+
111
+ | **SLID Model** | **Architecture** | **Hugging Face Card** | **Status** |
112
+ | :--- | :--- | :---: | :---: |
113
+ | **Simba-SLID-49** 🔍 | HuBERT | 🤗 [https://huggingface.co/UBC-NLP/Simba-SLIS-49](https://huggingface.co/UBC-NLPSimba-SLIS-49) | ✅ Released |
114
+
115
+
116
+ **🧩 Usage Example**
117
+
118
+ You can easily run inference using the Hugging Face `transformers` library.
119
+
120
+ ```python
121
+ from transformers import (
122
+ HubertForSequenceClassification,
123
+ AutoFeatureExtractor,
124
+ AutoProcessor
125
+ )
126
+ import torch
127
+
128
+ model_id = "UBC-NLP/Simba-SLIS_49"
129
+ model = HubertForSequenceClassification.from_pretrained(model_id).to("cuda")
130
+ # HuBERT models can use either processor or feature extractor depending on the specific model
131
+ try:
132
+ processor = AutoProcessor.from_pretrained(model_id)
133
+ print("Loaded Simba-SLIS_49 model with AutoProcessor")
134
+ except:
135
+ processor = AutoFeatureExtractor.from_pretrained(model_id)
136
+ print("Loaded Simba-SLIS_49 model with AutoFeatureExtractor")
137
+
138
+ # Optimize model for inference
139
+ model.eval()
140
+ audio_arrays = [] ### add your audio array
141
+ sample_rate=16000
142
+
143
+ nputs = processor(audio_arrays, sampling_rate=sample_rate, return_tensors="pt", padding=True).to("cuda")
144
+
145
+ # Different models might have slightly different input formats
146
+ try:
147
+ logits = model(**inputs).logits
148
+ except Exception as e:
149
+ # Try alternative input format if the first attempt fails
150
+ if "input_values" in inputs:
151
+ logits = model(input_values=inputs.input_values).logits
152
+ else:
153
+ raise e
154
+
155
+ # Calculate softmax probabilities
156
+ probs = torch.nn.functional.softmax(logits, dim=-1)
157
+
158
+ # Get the maximum probability (confidence) for each prediction
159
+ confidence_values, pred_ids = torch.max(probs, dim=-1)
160
+
161
+ # Convert to Python lists
162
+ pred_ids = pred_ids.tolist()
163
+ confidence_values = confidence_values.cpu().tolist()
164
+ # Get labels from IDs
165
+ pred_labels = [model.config.id2label[i] for i in pred_ids]
166
+
167
+
168
+ print(pred_labels, confidence_values)
169
+ ```
170
+
171
+
172
+ ## Citation
173
+
174
+ If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
175
+
176
+ ```bibtex
177
+
178
+ @inproceedings{elmadany-etal-2025-voice,
179
+ title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
180
+ author = "Elmadany, AbdelRahim A. and
181
+ Kwon, Sang Yun and
182
+ Toyin, Hawau Olamide and
183
+ Alcoba Inciarte, Alcides and
184
+ Aldarmaki, Hanan and
185
+ Abdul-Mageed, Muhammad",
186
+ editor = "Christodoulopoulos, Christos and
187
+ Chakraborty, Tanmoy and
188
+ Rose, Carolyn and
189
+ Peng, Violet",
190
+ booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
191
+ month = nov,
192
+ year = "2025",
193
+ address = "Suzhou, China",
194
+ publisher = "Association for Computational Linguistics",
195
+ url = "https://aclanthology.org/2025.emnlp-main.559/",
196
+ doi = "10.18653/v1/2025.emnlp-main.559",
197
+ pages = "11039--11061",
198
+ ISBN = "979-8-89176-332-6",
199
+ }
200
+
201
+ ```
202
+
203
+
.ipynb_checkpoints/config-checkpoint.json ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "ajesujoba/AfriHuBERT",
3
+ "activation_dropout": 0.1,
4
+ "apply_spec_augment": true,
5
+ "architectures": [
6
+ "HubertForSequenceClassification"
7
+ ],
8
+ "attention_dropout": 0.1,
9
+ "bos_token_id": 1,
10
+ "classifier_proj_size": 256,
11
+ "conv_bias": false,
12
+ "conv_dim": [
13
+ 512,
14
+ 512,
15
+ 512,
16
+ 512,
17
+ 512,
18
+ 512,
19
+ 512
20
+ ],
21
+ "conv_kernel": [
22
+ 10,
23
+ 3,
24
+ 3,
25
+ 3,
26
+ 3,
27
+ 2,
28
+ 2
29
+ ],
30
+ "conv_pos_batch_norm": false,
31
+ "conv_stride": [
32
+ 5,
33
+ 2,
34
+ 2,
35
+ 2,
36
+ 2,
37
+ 2,
38
+ 2
39
+ ],
40
+ "ctc_loss_reduction": "sum",
41
+ "ctc_zero_infinity": false,
42
+ "do_stable_layer_norm": false,
43
+ "eos_token_id": 2,
44
+ "feat_extract_activation": "gelu",
45
+ "feat_extract_dropout": 0.0,
46
+ "feat_extract_norm": "group",
47
+ "feat_proj_dropout": 0.1,
48
+ "feat_proj_layer_norm": true,
49
+ "final_dropout": 0.1,
50
+ "finetuning_task": "audio-classification",
51
+ "gradient_checkpointing": false,
52
+ "hidden_act": "gelu",
53
+ "hidden_dropout": 0.1,
54
+ "hidden_dropout_prob": 0.1,
55
+ "hidden_size": 768,
56
+ "id2label": {
57
+ "0": "Akuapim-twi",
58
+ "1": "Asante-twi",
59
+ "10": "eng",
60
+ "11": "eng-zul",
61
+ "12": "ewe",
62
+ "13": "fat",
63
+ "14": "fon",
64
+ "15": "fuc",
65
+ "16": "fuf",
66
+ "17": "gaa",
67
+ "18": "hau",
68
+ "19": "ibo",
69
+ "2": "aeb",
70
+ "20": "kab",
71
+ "21": "kin",
72
+ "22": "kln",
73
+ "23": "lin",
74
+ "24": "loz",
75
+ "25": "lug",
76
+ "26": "luo",
77
+ "27": "mlq",
78
+ "28": "nbl",
79
+ "29": "nso",
80
+ "3": "afr",
81
+ "30": "nya",
82
+ "31": "sot",
83
+ "32": "srr",
84
+ "33": "ssw",
85
+ "34": "sus",
86
+ "35": "swa",
87
+ "36": "swh",
88
+ "37": "tig",
89
+ "38": "tir",
90
+ "39": "toi",
91
+ "4": "amh",
92
+ "40": "tsn",
93
+ "41": "tso",
94
+ "42": "twi",
95
+ "43": "ven",
96
+ "44": "wol",
97
+ "45": "xho",
98
+ "46": "yor",
99
+ "47": "zgh",
100
+ "48": "zul",
101
+ "5": "ara",
102
+ "6": "bas",
103
+ "7": "bem",
104
+ "8": "dav",
105
+ "9": "dyu"
106
+ },
107
+ "initializer_range": 0.02,
108
+ "intermediate_size": 3072,
109
+ "label2id": {
110
+ "Akuapim-twi": "0",
111
+ "Asante-twi": "1",
112
+ "aeb": "2",
113
+ "afr": "3",
114
+ "amh": "4",
115
+ "ara": "5",
116
+ "bas": "6",
117
+ "bem": "7",
118
+ "dav": "8",
119
+ "dyu": "9",
120
+ "eng": "10",
121
+ "eng-zul": "11",
122
+ "ewe": "12",
123
+ "fat": "13",
124
+ "fon": "14",
125
+ "fuc": "15",
126
+ "fuf": "16",
127
+ "gaa": "17",
128
+ "hau": "18",
129
+ "ibo": "19",
130
+ "kab": "20",
131
+ "kin": "21",
132
+ "kln": "22",
133
+ "lin": "23",
134
+ "loz": "24",
135
+ "lug": "25",
136
+ "luo": "26",
137
+ "mlq": "27",
138
+ "nbl": "28",
139
+ "nso": "29",
140
+ "nya": "30",
141
+ "sot": "31",
142
+ "srr": "32",
143
+ "ssw": "33",
144
+ "sus": "34",
145
+ "swa": "35",
146
+ "swh": "36",
147
+ "tig": "37",
148
+ "tir": "38",
149
+ "toi": "39",
150
+ "tsn": "40",
151
+ "tso": "41",
152
+ "twi": "42",
153
+ "ven": "43",
154
+ "wol": "44",
155
+ "xho": "45",
156
+ "yor": "46",
157
+ "zgh": "47",
158
+ "zul": "48"
159
+ },
160
+ "layer_norm_eps": 1e-05,
161
+ "layerdrop": 0.1,
162
+ "mask_feature_length": 10,
163
+ "mask_feature_min_masks": 0,
164
+ "mask_feature_prob": 0.0,
165
+ "mask_time_length": 10,
166
+ "mask_time_min_masks": 2,
167
+ "mask_time_prob": 0.05,
168
+ "model_type": "hubert",
169
+ "num_attention_heads": 12,
170
+ "num_conv_pos_embedding_groups": 16,
171
+ "num_conv_pos_embeddings": 128,
172
+ "num_feat_extract_layers": 7,
173
+ "num_hidden_layers": 12,
174
+ "pad_token_id": 0,
175
+ "tokenizer_class": "Wav2Vec2CTCTokenizer",
176
+ "torch_dtype": "float32",
177
+ "transformers_version": "4.48.1",
178
+ "use_weighted_layer_sum": false,
179
+ "vocab_size": 32
180
+ }
README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ak # Akuapim Twi
4
+ - tw # Asante Twi
5
+ - aeb # Tunisian Arabic
6
+ - af # Afrikaans
7
+ - am # Amharic
8
+ - ar # Arabic
9
+ - bas # Basaa
10
+ - bem # Bemba
11
+ - dav # Taita
12
+ - dyu # Dyula
13
+ - en # English
14
+ - pcm # Nigerian Pidgin
15
+ - ee # Ewe
16
+ - fat # Fanti
17
+ - fon # Fon
18
+ - fuc # Pulaar
19
+ - ff # Pular
20
+ - gaa # Ga
21
+ - ha # Hausa
22
+ - ig # Igbo
23
+ - kab # Kabyle
24
+ - rw # Kinyarwanda
25
+ - kln # Kalenjin
26
+ - ln # Lingala
27
+ - loz # Lozi
28
+ - lg # Luganda
29
+ - luo # Luo
30
+ - mlq # Western Maninkakan
31
+ - nr # South Ndebele
32
+ - nso # Northern Sotho
33
+ - ny # Chichewa
34
+ - st # Southern Sotho
35
+ - srr # Serer
36
+ - ss # Swati
37
+ - sus # Susu
38
+ - sw # Kiswahili/Swahili
39
+ - tig # Tigre
40
+ - ti # Tigrinya
41
+ - toi # Tonga
42
+ - tn # Tswana
43
+ - ts # Tsonga
44
+ - tw # Twi
45
+ - ve # Venda
46
+ - wo # Wolof
47
+ - xh # Xhosa
48
+ - yo # Yoruba
49
+ - zgh # Standard Moroccan Tamazight
50
+ - zu # Zulu
51
+
52
+ license: cc-by-4.0
53
+ tags:
54
+ - automatic-speech-recognition
55
+ - audio
56
+ - speech
57
+ - african-languages
58
+ - multilingual
59
+ - simba
60
+ - low-resource
61
+ - speech-recognition
62
+ - asr
63
+ - spoken-language-identification
64
+ - language-identification
65
+ datasets:
66
+ - UBC-NLP/SimbaBench
67
+ metrics:
68
+ - wer
69
+ - cer
70
+ - accuracy
71
+ library_name: transformers
72
+ pipeline_tag: automatic-speech-recognition
73
+ ---
74
+
75
+ <div align="center">
76
+
77
+ <img src="https://africa.dlnlp.ai/simba/images/VoC_logo.png" alt="VoC Logo">
78
+
79
+ [![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
80
+ [![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
81
+ [![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
82
+ [![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=github&logoColor=181717&labelColor=E0E0E0)](https://github.com/UBC-NLP/simba)
83
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
84
+ [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset)
85
+
86
+ </div>
87
+
88
+ ## *Bridging the Digital Divide for African AI*
89
+
90
+ **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
91
+
92
+ ## Best-in-Class Multilingual Models
93
+
94
+ <img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
95
+
96
+ Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
97
+
98
+ - **Unified Suite:** Models optimized for African languages.
99
+ - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
100
+ - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
101
+ - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
102
+
103
+ The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
104
+
105
+
106
+ ### 🔍 Simba-SLID (Spoken Language Identification)
107
+ * **🎯 Task:** `Spoken Language Identification` — Intelligent input routing.
108
+ * **🌍 Language Coverage (49 African languages)**
109
+ > **Akuapim Twi** (`Akuapim-twi`), **Asante Twi** (`Asante-twi`), **Tunisian Arabic** (`aeb`), **Afrikaans** (`afr`), **Amharic** (`amh`), **Arabic** (`ara`), **Basaa** (`bas`), **Bemba** (`bem`), **Taita** (`dav`), **Dyula** (`dyu`), **English** (`eng`), **Nigerian Pidgin** (`eng-zul`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **Pulaar** (`fuc`), **Pular** (`fuf`), **Ga** (`gaa`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabyle** (`kab`), **Kinyarwanda** (`kin`), **Kalenjin** (`kln`), **Lingala** (`lin`), **Lozi** (`loz`), **Luganda** (`lug`), **Luo** (`luo`), **Western Maninkakan** (`mlq`), **South Ndebele** (`nbl`), **Northern Sotho** (`nso`), **Chichewa** (`nya`), **Southern Sotho** (`sot`), **Serer** (`srr`), **Swati** (`ssw`), **Susu** (`sus`), **Kiswahili** (`swa`), **Swahili** (`swh`), **Tigre** (`tig`), **Tigrinya** (`tir`), **Tonga** (`toi`), **Tswana** (`tsn`), **Tsonga** (`tso`), **Twi** (`twi`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Standard Moroccan Tamazight** (`zgh`), **Zulu** (`zul`)
110
+
111
+ | **SLID Model** | **Architecture** | **Hugging Face Card** | **Status** |
112
+ | :--- | :--- | :---: | :---: |
113
+ | **Simba-SLID-49** 🔍 | HuBERT | 🤗 [https://huggingface.co/UBC-NLP/Simba-SLIS-49](https://huggingface.co/UBC-NLPSimba-SLIS-49) | ✅ Released |
114
+
115
+
116
+ **🧩 Usage Example**
117
+
118
+ You can easily run inference using the Hugging Face `transformers` library.
119
+
120
+ ```python
121
+ from transformers import (
122
+ HubertForSequenceClassification,
123
+ AutoFeatureExtractor,
124
+ AutoProcessor
125
+ )
126
+ import torch
127
+
128
+ model_id = "UBC-NLP/Simba-SLIS_49"
129
+ model = HubertForSequenceClassification.from_pretrained(model_id).to("cuda")
130
+ # HuBERT models can use either processor or feature extractor depending on the specific model
131
+ try:
132
+ processor = AutoProcessor.from_pretrained(model_id)
133
+ print("Loaded Simba-SLIS_49 model with AutoProcessor")
134
+ except:
135
+ processor = AutoFeatureExtractor.from_pretrained(model_id)
136
+ print("Loaded Simba-SLIS_49 model with AutoFeatureExtractor")
137
+
138
+ # Optimize model for inference
139
+ model.eval()
140
+ audio_arrays = [] ### add your audio array
141
+ sample_rate=16000
142
+
143
+ nputs = processor(audio_arrays, sampling_rate=sample_rate, return_tensors="pt", padding=True).to("cuda")
144
+
145
+ # Different models might have slightly different input formats
146
+ try:
147
+ logits = model(**inputs).logits
148
+ except Exception as e:
149
+ # Try alternative input format if the first attempt fails
150
+ if "input_values" in inputs:
151
+ logits = model(input_values=inputs.input_values).logits
152
+ else:
153
+ raise e
154
+
155
+ # Calculate softmax probabilities
156
+ probs = torch.nn.functional.softmax(logits, dim=-1)
157
+
158
+ # Get the maximum probability (confidence) for each prediction
159
+ confidence_values, pred_ids = torch.max(probs, dim=-1)
160
+
161
+ # Convert to Python lists
162
+ pred_ids = pred_ids.tolist()
163
+ confidence_values = confidence_values.cpu().tolist()
164
+ # Get labels from IDs
165
+ pred_labels = [model.config.id2label[i] for i in pred_ids]
166
+
167
+
168
+ print(pred_labels, confidence_values)
169
+ ```
170
+
171
+
172
+ ## Citation
173
+
174
+ If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
175
+
176
+ ```bibtex
177
+
178
+ @inproceedings{elmadany-etal-2025-voice,
179
+ title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
180
+ author = "Elmadany, AbdelRahim A. and
181
+ Kwon, Sang Yun and
182
+ Toyin, Hawau Olamide and
183
+ Alcoba Inciarte, Alcides and
184
+ Aldarmaki, Hanan and
185
+ Abdul-Mageed, Muhammad",
186
+ editor = "Christodoulopoulos, Christos and
187
+ Chakraborty, Tanmoy and
188
+ Rose, Carolyn and
189
+ Peng, Violet",
190
+ booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
191
+ month = nov,
192
+ year = "2025",
193
+ address = "Suzhou, China",
194
+ publisher = "Association for Computational Linguistics",
195
+ url = "https://aclanthology.org/2025.emnlp-main.559/",
196
+ doi = "10.18653/v1/2025.emnlp-main.559",
197
+ pages = "11039--11061",
198
+ ISBN = "979-8-89176-332-6",
199
+ }
200
+
201
+ ```
202
+
203
+
config.json ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "ajesujoba/AfriHuBERT",
3
+ "activation_dropout": 0.1,
4
+ "apply_spec_augment": true,
5
+ "architectures": [
6
+ "HubertForSequenceClassification"
7
+ ],
8
+ "attention_dropout": 0.1,
9
+ "bos_token_id": 1,
10
+ "classifier_proj_size": 256,
11
+ "conv_bias": false,
12
+ "conv_dim": [
13
+ 512,
14
+ 512,
15
+ 512,
16
+ 512,
17
+ 512,
18
+ 512,
19
+ 512
20
+ ],
21
+ "conv_kernel": [
22
+ 10,
23
+ 3,
24
+ 3,
25
+ 3,
26
+ 3,
27
+ 2,
28
+ 2
29
+ ],
30
+ "conv_pos_batch_norm": false,
31
+ "conv_stride": [
32
+ 5,
33
+ 2,
34
+ 2,
35
+ 2,
36
+ 2,
37
+ 2,
38
+ 2
39
+ ],
40
+ "ctc_loss_reduction": "sum",
41
+ "ctc_zero_infinity": false,
42
+ "do_stable_layer_norm": false,
43
+ "eos_token_id": 2,
44
+ "feat_extract_activation": "gelu",
45
+ "feat_extract_dropout": 0.0,
46
+ "feat_extract_norm": "group",
47
+ "feat_proj_dropout": 0.1,
48
+ "feat_proj_layer_norm": true,
49
+ "final_dropout": 0.1,
50
+ "finetuning_task": "audio-classification",
51
+ "gradient_checkpointing": false,
52
+ "hidden_act": "gelu",
53
+ "hidden_dropout": 0.1,
54
+ "hidden_dropout_prob": 0.1,
55
+ "hidden_size": 768,
56
+ "id2label": {
57
+ "0": "Akuapim-twi",
58
+ "1": "Asante-twi",
59
+ "10": "eng",
60
+ "11": "eng-zul",
61
+ "12": "ewe",
62
+ "13": "fat",
63
+ "14": "fon",
64
+ "15": "fuc",
65
+ "16": "fuf",
66
+ "17": "gaa",
67
+ "18": "hau",
68
+ "19": "ibo",
69
+ "2": "aeb",
70
+ "20": "kab",
71
+ "21": "kin",
72
+ "22": "kln",
73
+ "23": "lin",
74
+ "24": "loz",
75
+ "25": "lug",
76
+ "26": "luo",
77
+ "27": "mlq",
78
+ "28": "nbl",
79
+ "29": "nso",
80
+ "3": "afr",
81
+ "30": "nya",
82
+ "31": "sot",
83
+ "32": "srr",
84
+ "33": "ssw",
85
+ "34": "sus",
86
+ "35": "swa",
87
+ "36": "swh",
88
+ "37": "tig",
89
+ "38": "tir",
90
+ "39": "toi",
91
+ "4": "amh",
92
+ "40": "tsn",
93
+ "41": "tso",
94
+ "42": "twi",
95
+ "43": "ven",
96
+ "44": "wol",
97
+ "45": "xho",
98
+ "46": "yor",
99
+ "47": "zgh",
100
+ "48": "zul",
101
+ "5": "ara",
102
+ "6": "bas",
103
+ "7": "bem",
104
+ "8": "dav",
105
+ "9": "dyu"
106
+ },
107
+ "initializer_range": 0.02,
108
+ "intermediate_size": 3072,
109
+ "label2id": {
110
+ "Akuapim-twi": "0",
111
+ "Asante-twi": "1",
112
+ "aeb": "2",
113
+ "afr": "3",
114
+ "amh": "4",
115
+ "ara": "5",
116
+ "bas": "6",
117
+ "bem": "7",
118
+ "dav": "8",
119
+ "dyu": "9",
120
+ "eng": "10",
121
+ "eng-zul": "11",
122
+ "ewe": "12",
123
+ "fat": "13",
124
+ "fon": "14",
125
+ "fuc": "15",
126
+ "fuf": "16",
127
+ "gaa": "17",
128
+ "hau": "18",
129
+ "ibo": "19",
130
+ "kab": "20",
131
+ "kin": "21",
132
+ "kln": "22",
133
+ "lin": "23",
134
+ "loz": "24",
135
+ "lug": "25",
136
+ "luo": "26",
137
+ "mlq": "27",
138
+ "nbl": "28",
139
+ "nso": "29",
140
+ "nya": "30",
141
+ "sot": "31",
142
+ "srr": "32",
143
+ "ssw": "33",
144
+ "sus": "34",
145
+ "swa": "35",
146
+ "swh": "36",
147
+ "tig": "37",
148
+ "tir": "38",
149
+ "toi": "39",
150
+ "tsn": "40",
151
+ "tso": "41",
152
+ "twi": "42",
153
+ "ven": "43",
154
+ "wol": "44",
155
+ "xho": "45",
156
+ "yor": "46",
157
+ "zgh": "47",
158
+ "zul": "48"
159
+ },
160
+ "layer_norm_eps": 1e-05,
161
+ "layerdrop": 0.1,
162
+ "mask_feature_length": 10,
163
+ "mask_feature_min_masks": 0,
164
+ "mask_feature_prob": 0.0,
165
+ "mask_time_length": 10,
166
+ "mask_time_min_masks": 2,
167
+ "mask_time_prob": 0.05,
168
+ "model_type": "hubert",
169
+ "num_attention_heads": 12,
170
+ "num_conv_pos_embedding_groups": 16,
171
+ "num_conv_pos_embeddings": 128,
172
+ "num_feat_extract_layers": 7,
173
+ "num_hidden_layers": 12,
174
+ "pad_token_id": 0,
175
+ "tokenizer_class": "Wav2Vec2CTCTokenizer",
176
+ "torch_dtype": "float32",
177
+ "transformers_version": "4.48.1",
178
+ "use_weighted_layer_sum": false,
179
+ "vocab_size": 32
180
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b093d79e77272669d34041d5b010e0d79c6fa0e0222b94cf300f24019786eb14
3
+ size 378350268
preprocessor_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_extractor_type": "Wav2Vec2FeatureExtractor",
4
+ "feature_size": 1,
5
+ "padding_side": "right",
6
+ "padding_value": 0,
7
+ "return_attention_mask": false,
8
+ "sampling_rate": 16000
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,552 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.4363306793570824,
3
+ "best_model_checkpoint": "./outputs_slid/ajesujoba/AfriHuBERT/checkpoint-1830",
4
+ "epoch": 29.99591836734694,
5
+ "eval_steps": 500,
6
+ "global_step": 5490,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.9959183673469387,
13
+ "grad_norm": 0.7721740007400513,
14
+ "learning_rate": 1.6666666666666667e-05,
15
+ "loss": 3.7874,
16
+ "step": 183
17
+ },
18
+ {
19
+ "epoch": 0.9959183673469387,
20
+ "eval_accuracy": 0.04349865165904561,
21
+ "eval_f1": 0.004599781789327515,
22
+ "eval_loss": 3.9047515392303467,
23
+ "eval_runtime": 28.6696,
24
+ "eval_samples_per_second": 297.493,
25
+ "eval_steps_per_second": 0.593,
26
+ "step": 183
27
+ },
28
+ {
29
+ "epoch": 1.9959183673469387,
30
+ "grad_norm": 0.8705180883407593,
31
+ "learning_rate": 3.3333333333333335e-05,
32
+ "loss": 3.0785,
33
+ "step": 366
34
+ },
35
+ {
36
+ "epoch": 1.9959183673469387,
37
+ "eval_accuracy": 0.16649079610739828,
38
+ "eval_f1": 0.07176290896776823,
39
+ "eval_loss": 3.3783769607543945,
40
+ "eval_runtime": 17.2721,
41
+ "eval_samples_per_second": 493.804,
42
+ "eval_steps_per_second": 0.984,
43
+ "step": 366
44
+ },
45
+ {
46
+ "epoch": 2.9959183673469387,
47
+ "grad_norm": 1.0882235765457153,
48
+ "learning_rate": 5e-05,
49
+ "loss": 1.9687,
50
+ "step": 549
51
+ },
52
+ {
53
+ "epoch": 2.9959183673469387,
54
+ "eval_accuracy": 0.41739946066361827,
55
+ "eval_f1": 0.23146127598121502,
56
+ "eval_loss": 2.4746670722961426,
57
+ "eval_runtime": 18.0262,
58
+ "eval_samples_per_second": 473.145,
59
+ "eval_steps_per_second": 0.943,
60
+ "step": 549
61
+ },
62
+ {
63
+ "epoch": 3.9959183673469387,
64
+ "grad_norm": 0.830756425857544,
65
+ "learning_rate": 4.983095894354858e-05,
66
+ "loss": 1.0019,
67
+ "step": 732
68
+ },
69
+ {
70
+ "epoch": 3.9959183673469387,
71
+ "eval_accuracy": 0.5312463360300153,
72
+ "eval_f1": 0.33343763170872565,
73
+ "eval_loss": 2.056602954864502,
74
+ "eval_runtime": 17.7176,
75
+ "eval_samples_per_second": 481.386,
76
+ "eval_steps_per_second": 0.959,
77
+ "step": 732
78
+ },
79
+ {
80
+ "epoch": 4.995918367346938,
81
+ "grad_norm": 1.341150164604187,
82
+ "learning_rate": 4.9326121764495596e-05,
83
+ "loss": 0.4955,
84
+ "step": 915
85
+ },
86
+ {
87
+ "epoch": 4.995918367346938,
88
+ "eval_accuracy": 0.5872904209168719,
89
+ "eval_f1": 0.3966908687854425,
90
+ "eval_loss": 2.070507526397705,
91
+ "eval_runtime": 17.7204,
92
+ "eval_samples_per_second": 481.309,
93
+ "eval_steps_per_second": 0.959,
94
+ "step": 915
95
+ },
96
+ {
97
+ "epoch": 5.995918367346938,
98
+ "grad_norm": 1.4914641380310059,
99
+ "learning_rate": 4.849231551964771e-05,
100
+ "loss": 0.3149,
101
+ "step": 1098
102
+ },
103
+ {
104
+ "epoch": 5.995918367346938,
105
+ "eval_accuracy": 0.608277640989565,
106
+ "eval_f1": 0.41221796485256534,
107
+ "eval_loss": 2.174699544906616,
108
+ "eval_runtime": 18.7633,
109
+ "eval_samples_per_second": 454.558,
110
+ "eval_steps_per_second": 0.906,
111
+ "step": 1098
112
+ },
113
+ {
114
+ "epoch": 6.995918367346938,
115
+ "grad_norm": 1.016514539718628,
116
+ "learning_rate": 4.734081600808531e-05,
117
+ "loss": 0.2324,
118
+ "step": 1281
119
+ },
120
+ {
121
+ "epoch": 6.995918367346938,
122
+ "eval_accuracy": 0.6051119709227342,
123
+ "eval_f1": 0.42029644401424293,
124
+ "eval_loss": 2.536925792694092,
125
+ "eval_runtime": 19.0396,
126
+ "eval_samples_per_second": 447.961,
127
+ "eval_steps_per_second": 0.893,
128
+ "step": 1281
129
+ },
130
+ {
131
+ "epoch": 7.995918367346938,
132
+ "grad_norm": 0.6603855490684509,
133
+ "learning_rate": 4.588719528532342e-05,
134
+ "loss": 0.1825,
135
+ "step": 1464
136
+ },
137
+ {
138
+ "epoch": 7.995918367346938,
139
+ "eval_accuracy": 0.5930355258529723,
140
+ "eval_f1": 0.37922494807809526,
141
+ "eval_loss": 2.6477608680725098,
142
+ "eval_runtime": 18.8796,
143
+ "eval_samples_per_second": 451.757,
144
+ "eval_steps_per_second": 0.9,
145
+ "step": 1464
146
+ },
147
+ {
148
+ "epoch": 8.995918367346938,
149
+ "grad_norm": 0.9515678286552429,
150
+ "learning_rate": 4.415111107797445e-05,
151
+ "loss": 0.1581,
152
+ "step": 1647
153
+ },
154
+ {
155
+ "epoch": 8.995918367346938,
156
+ "eval_accuracy": 0.5848282330871145,
157
+ "eval_f1": 0.3902253760074279,
158
+ "eval_loss": 2.7652101516723633,
159
+ "eval_runtime": 28.9433,
160
+ "eval_samples_per_second": 294.68,
161
+ "eval_steps_per_second": 0.587,
162
+ "step": 1647
163
+ },
164
+ {
165
+ "epoch": 9.995918367346938,
166
+ "grad_norm": 0.5628945827484131,
167
+ "learning_rate": 4.215604094671835e-05,
168
+ "loss": 0.1386,
169
+ "step": 1830
170
+ },
171
+ {
172
+ "epoch": 9.995918367346938,
173
+ "eval_accuracy": 0.6253957087583538,
174
+ "eval_f1": 0.4363306793570824,
175
+ "eval_loss": 2.5493264198303223,
176
+ "eval_runtime": 17.9843,
177
+ "eval_samples_per_second": 474.247,
178
+ "eval_steps_per_second": 0.945,
179
+ "step": 1830
180
+ },
181
+ {
182
+ "epoch": 10.995918367346938,
183
+ "grad_norm": 0.5759875178337097,
184
+ "learning_rate": 3.9928964792569655e-05,
185
+ "loss": 0.13,
186
+ "step": 2013
187
+ },
188
+ {
189
+ "epoch": 10.995918367346938,
190
+ "eval_accuracy": 0.6325477781686012,
191
+ "eval_f1": 0.42658322719917263,
192
+ "eval_loss": 2.668961763381958,
193
+ "eval_runtime": 17.9422,
194
+ "eval_samples_per_second": 475.359,
195
+ "eval_steps_per_second": 0.947,
196
+ "step": 2013
197
+ },
198
+ {
199
+ "epoch": 11.995918367346938,
200
+ "grad_norm": 0.7909059524536133,
201
+ "learning_rate": 3.7500000000000003e-05,
202
+ "loss": 0.1134,
203
+ "step": 2196
204
+ },
205
+ {
206
+ "epoch": 11.995918367346938,
207
+ "eval_accuracy": 0.5902215969046781,
208
+ "eval_f1": 0.40717895597633597,
209
+ "eval_loss": 2.847268581390381,
210
+ "eval_runtime": 18.1922,
211
+ "eval_samples_per_second": 468.828,
212
+ "eval_steps_per_second": 0.934,
213
+ "step": 2196
214
+ },
215
+ {
216
+ "epoch": 12.995918367346938,
217
+ "grad_norm": 0.6743366718292236,
218
+ "learning_rate": 3.490199415097892e-05,
219
+ "loss": 0.1078,
220
+ "step": 2379
221
+ },
222
+ {
223
+ "epoch": 12.995918367346938,
224
+ "eval_accuracy": 0.6048774768437097,
225
+ "eval_f1": 0.40486374255791757,
226
+ "eval_loss": 2.909079074859619,
227
+ "eval_runtime": 17.3197,
228
+ "eval_samples_per_second": 492.446,
229
+ "eval_steps_per_second": 0.982,
230
+ "step": 2379
231
+ },
232
+ {
233
+ "epoch": 13.995918367346938,
234
+ "grad_norm": 0.6435021758079529,
235
+ "learning_rate": 3.217008081777726e-05,
236
+ "loss": 0.0929,
237
+ "step": 2562
238
+ },
239
+ {
240
+ "epoch": 13.995918367346938,
241
+ "eval_accuracy": 0.6124985344120061,
242
+ "eval_f1": 0.402051577315403,
243
+ "eval_loss": 2.901214599609375,
244
+ "eval_runtime": 18.278,
245
+ "eval_samples_per_second": 466.625,
246
+ "eval_steps_per_second": 0.93,
247
+ "step": 2562
248
+ },
249
+ {
250
+ "epoch": 14.995918367346938,
251
+ "grad_norm": 0.7225833535194397,
252
+ "learning_rate": 2.9341204441673266e-05,
253
+ "loss": 0.0879,
254
+ "step": 2745
255
+ },
256
+ {
257
+ "epoch": 14.995918367346938,
258
+ "eval_accuracy": 0.5815453159807715,
259
+ "eval_f1": 0.3787146481538575,
260
+ "eval_loss": 2.927959442138672,
261
+ "eval_runtime": 19.3124,
262
+ "eval_samples_per_second": 441.634,
263
+ "eval_steps_per_second": 0.88,
264
+ "step": 2745
265
+ },
266
+ {
267
+ "epoch": 15.995918367346938,
268
+ "grad_norm": 0.519130527973175,
269
+ "learning_rate": 2.6453620722761896e-05,
270
+ "loss": 0.0875,
271
+ "step": 2928
272
+ },
273
+ {
274
+ "epoch": 15.995918367346938,
275
+ "eval_accuracy": 0.6116778051354204,
276
+ "eval_f1": 0.42421911178450894,
277
+ "eval_loss": 2.8714120388031006,
278
+ "eval_runtime": 18.6944,
279
+ "eval_samples_per_second": 456.233,
280
+ "eval_steps_per_second": 0.909,
281
+ "step": 2928
282
+ },
283
+ {
284
+ "epoch": 16.99591836734694,
285
+ "grad_norm": 0.5847667455673218,
286
+ "learning_rate": 2.3546379277238107e-05,
287
+ "loss": 0.083,
288
+ "step": 3111
289
+ },
290
+ {
291
+ "epoch": 16.99591836734694,
292
+ "eval_accuracy": 0.604994723883222,
293
+ "eval_f1": 0.40283444897722465,
294
+ "eval_loss": 2.9251325130462646,
295
+ "eval_runtime": 19.0241,
296
+ "eval_samples_per_second": 448.325,
297
+ "eval_steps_per_second": 0.894,
298
+ "step": 3111
299
+ },
300
+ {
301
+ "epoch": 17.99591836734694,
302
+ "grad_norm": 0.5335302948951721,
303
+ "learning_rate": 2.0658795558326743e-05,
304
+ "loss": 0.0743,
305
+ "step": 3294
306
+ },
307
+ {
308
+ "epoch": 17.99591836734694,
309
+ "eval_accuracy": 0.6085121350685895,
310
+ "eval_f1": 0.3982368535619314,
311
+ "eval_loss": 2.907853364944458,
312
+ "eval_runtime": 18.6799,
313
+ "eval_samples_per_second": 456.587,
314
+ "eval_steps_per_second": 0.91,
315
+ "step": 3294
316
+ },
317
+ {
318
+ "epoch": 18.99591836734694,
319
+ "grad_norm": 0.6082349419593811,
320
+ "learning_rate": 1.7829919182222752e-05,
321
+ "loss": 0.0743,
322
+ "step": 3477
323
+ },
324
+ {
325
+ "epoch": 18.99591836734694,
326
+ "eval_accuracy": 0.6140227459256654,
327
+ "eval_f1": 0.40722488778058297,
328
+ "eval_loss": 2.9568777084350586,
329
+ "eval_runtime": 18.2131,
330
+ "eval_samples_per_second": 468.288,
331
+ "eval_steps_per_second": 0.933,
332
+ "step": 3477
333
+ },
334
+ {
335
+ "epoch": 19.99591836734694,
336
+ "grad_norm": 0.5372836589813232,
337
+ "learning_rate": 1.5112603381728762e-05,
338
+ "loss": 0.0745,
339
+ "step": 3660
340
+ },
341
+ {
342
+ "epoch": 19.99591836734694,
343
+ "eval_accuracy": 0.6022980419744401,
344
+ "eval_f1": 0.3888247133789473,
345
+ "eval_loss": 3.133009910583496,
346
+ "eval_runtime": 19.5015,
347
+ "eval_samples_per_second": 437.351,
348
+ "eval_steps_per_second": 0.872,
349
+ "step": 3660
350
+ },
351
+ {
352
+ "epoch": 20.99591836734694,
353
+ "grad_norm": 0.4080846905708313,
354
+ "learning_rate": 1.2513768458995337e-05,
355
+ "loss": 0.0641,
356
+ "step": 3843
357
+ },
358
+ {
359
+ "epoch": 20.99591836734694,
360
+ "eval_accuracy": 0.6041739946066362,
361
+ "eval_f1": 0.4024604989707059,
362
+ "eval_loss": 3.086355447769165,
363
+ "eval_runtime": 18.9488,
364
+ "eval_samples_per_second": 450.109,
365
+ "eval_steps_per_second": 0.897,
366
+ "step": 3843
367
+ },
368
+ {
369
+ "epoch": 21.99591836734694,
370
+ "grad_norm": 0.6301392316818237,
371
+ "learning_rate": 1.0083788397924998e-05,
372
+ "loss": 0.0611,
373
+ "step": 4026
374
+ },
375
+ {
376
+ "epoch": 21.99591836734694,
377
+ "eval_accuracy": 0.611560558095908,
378
+ "eval_f1": 0.4250797125355288,
379
+ "eval_loss": 3.1089813709259033,
380
+ "eval_runtime": 19.3666,
381
+ "eval_samples_per_second": 440.398,
382
+ "eval_steps_per_second": 0.878,
383
+ "step": 4026
384
+ },
385
+ {
386
+ "epoch": 22.99591836734694,
387
+ "grad_norm": 0.7403397560119629,
388
+ "learning_rate": 7.855524510252082e-06,
389
+ "loss": 0.0618,
390
+ "step": 4209
391
+ },
392
+ {
393
+ "epoch": 22.99591836734694,
394
+ "eval_accuracy": 0.6095673584241997,
395
+ "eval_f1": 0.38478101379896623,
396
+ "eval_loss": 3.165566921234131,
397
+ "eval_runtime": 18.268,
398
+ "eval_samples_per_second": 466.882,
399
+ "eval_steps_per_second": 0.931,
400
+ "step": 4209
401
+ },
402
+ {
403
+ "epoch": 23.99591836734694,
404
+ "grad_norm": 0.6018996238708496,
405
+ "learning_rate": 5.8591102425065766e-06,
406
+ "loss": 0.0595,
407
+ "step": 4392
408
+ },
409
+ {
410
+ "epoch": 23.99591836734694,
411
+ "eval_accuracy": 0.6026497830929769,
412
+ "eval_f1": 0.4033953887201948,
413
+ "eval_loss": 3.182464122772217,
414
+ "eval_runtime": 18.8509,
415
+ "eval_samples_per_second": 452.446,
416
+ "eval_steps_per_second": 0.902,
417
+ "step": 4392
418
+ },
419
+ {
420
+ "epoch": 24.99591836734694,
421
+ "grad_norm": 0.7152003049850464,
422
+ "learning_rate": 4.1215436728432114e-06,
423
+ "loss": 0.0549,
424
+ "step": 4575
425
+ },
426
+ {
427
+ "epoch": 24.99591836734694,
428
+ "eval_accuracy": 0.6062844413178567,
429
+ "eval_f1": 0.3998774411315016,
430
+ "eval_loss": 3.2211174964904785,
431
+ "eval_runtime": 18.4161,
432
+ "eval_samples_per_second": 463.128,
433
+ "eval_steps_per_second": 0.923,
434
+ "step": 4575
435
+ },
436
+ {
437
+ "epoch": 25.99591836734694,
438
+ "grad_norm": 0.655457615852356,
439
+ "learning_rate": 2.6663224083492645e-06,
440
+ "loss": 0.0578,
441
+ "step": 4758
442
+ },
443
+ {
444
+ "epoch": 25.99591836734694,
445
+ "eval_accuracy": 0.6093328643451753,
446
+ "eval_f1": 0.40241682477511076,
447
+ "eval_loss": 3.154259204864502,
448
+ "eval_runtime": 19.0328,
449
+ "eval_samples_per_second": 448.122,
450
+ "eval_steps_per_second": 0.893,
451
+ "step": 4758
452
+ },
453
+ {
454
+ "epoch": 26.99591836734694,
455
+ "grad_norm": 0.8799217939376831,
456
+ "learning_rate": 1.5131258202183586e-06,
457
+ "loss": 0.0531,
458
+ "step": 4941
459
+ },
460
+ {
461
+ "epoch": 26.99591836734694,
462
+ "eval_accuracy": 0.611560558095908,
463
+ "eval_f1": 0.4136571965633068,
464
+ "eval_loss": 3.1584064960479736,
465
+ "eval_runtime": 19.4229,
466
+ "eval_samples_per_second": 439.121,
467
+ "eval_steps_per_second": 0.875,
468
+ "step": 4941
469
+ },
470
+ {
471
+ "epoch": 27.99591836734694,
472
+ "grad_norm": 0.5971439480781555,
473
+ "learning_rate": 6.775489140148194e-07,
474
+ "loss": 0.0556,
475
+ "step": 5124
476
+ },
477
+ {
478
+ "epoch": 27.99591836734694,
479
+ "eval_accuracy": 0.6054637120412709,
480
+ "eval_f1": 0.4107652565512037,
481
+ "eval_loss": 3.177584171295166,
482
+ "eval_runtime": 18.7393,
483
+ "eval_samples_per_second": 455.14,
484
+ "eval_steps_per_second": 0.907,
485
+ "step": 5124
486
+ },
487
+ {
488
+ "epoch": 28.99591836734694,
489
+ "grad_norm": 0.5378488898277283,
490
+ "learning_rate": 1.7089143397631958e-07,
491
+ "loss": 0.0592,
492
+ "step": 5307
493
+ },
494
+ {
495
+ "epoch": 28.99591836734694,
496
+ "eval_accuracy": 0.604994723883222,
497
+ "eval_f1": 0.41074234435939105,
498
+ "eval_loss": 3.1705150604248047,
499
+ "eval_runtime": 19.1621,
500
+ "eval_samples_per_second": 445.096,
501
+ "eval_steps_per_second": 0.887,
502
+ "step": 5307
503
+ },
504
+ {
505
+ "epoch": 29.99591836734694,
506
+ "grad_norm": 0.7799643278121948,
507
+ "learning_rate": 5.053357646223056e-12,
508
+ "loss": 0.0511,
509
+ "step": 5490
510
+ },
511
+ {
512
+ "epoch": 29.99591836734694,
513
+ "eval_accuracy": 0.6051119709227342,
514
+ "eval_f1": 0.41072097568738997,
515
+ "eval_loss": 3.1688835620880127,
516
+ "eval_runtime": 18.8369,
517
+ "eval_samples_per_second": 452.78,
518
+ "eval_steps_per_second": 0.902,
519
+ "step": 5490
520
+ },
521
+ {
522
+ "epoch": 29.99591836734694,
523
+ "step": 5490,
524
+ "total_flos": 5.117922821239409e+20,
525
+ "train_loss": 0.060013725892225034,
526
+ "train_runtime": 5236.9953,
527
+ "train_samples_per_second": 2153.277,
528
+ "train_steps_per_second": 1.048
529
+ }
530
+ ],
531
+ "logging_steps": 500,
532
+ "max_steps": 5490,
533
+ "num_input_tokens_seen": 0,
534
+ "num_train_epochs": 30,
535
+ "save_steps": 500,
536
+ "stateful_callbacks": {
537
+ "TrainerControl": {
538
+ "args": {
539
+ "should_epoch_stop": false,
540
+ "should_evaluate": false,
541
+ "should_log": false,
542
+ "should_save": true,
543
+ "should_training_stop": true
544
+ },
545
+ "attributes": {}
546
+ }
547
+ },
548
+ "total_flos": 5.117922821239409e+20,
549
+ "train_batch_size": 128,
550
+ "trial_name": null,
551
+ "trial_params": null
552
+ }