UBC-NLP
/

Simba-X

+---
+language:
+  - am  # Amharic
+  - ar  # Arabic
+  - tw  # Asante Twi
+  - bm  # Bambara
+  - fr  # French
+  - lg  # Ganda
+  - ha  # Hausa
+  - ig  # Igbo
+  - rw  # Kinyarwanda
+  - kg  # Kongo
+  - ln  # Lingala
+  - lu  # Luba-Katanga
+  - mg  # Malagasy
+  - nso # Northern Sotho
+  - ny  # Nyanja
+  - om  # Oromo
+  - pt  # Portuguese
+  - sn  # Shona
+  - so  # Somali
+  - st  # Southern Sotho
+  - sw  # Swahili
+  - ss  # Swati
+  - ti  # Tigrinya
+  - ts  # Tsonga
+  - tn  # Tswana
+  - ak  # Twi
+  - ve  # Venda
+  - wo  # Wolof
+  - xh  # Xhosa
+  - yo  # Yoruba
+  - zu  # Zulu
+  - tzm # Tamazight
+  - sg  # Sango
+  - din # Dinka
+  - ee  # Ewe
+  - fo  # Fon
+  - luo # Luo
+  - mos # Mossi
+  - umb # Umbundu
+license: cc-by-4.0
+tags:
+  - automatic-speech-recognition
+  - audio
+  - speech
+  - african-languages
+  - multilingual
+  - simba
+  - low-resource
+  - speech-recognition
+  - asr
+datasets:
+  - UBC-NLP/SimbaBench
+metrics:
+  - wer
+  - cer
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+---
+<div align="center">
+<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
+[![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
+[![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
+[![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](#simbabench)
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=black&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
+[![YouTube Video](https://img.shields.io/badge/YouTube-Video-FF0000?style=for-the-badge&logo=youtube&logoColor=FF0000&labelColor=FFCCBC)](#demo)
+</div>
+## *Bridging the Digital Divide for African AI*
+**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
+## Best-in-Class Multilingual Models
+Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
+- **Unified Suite:** Models optimized for African languages.
+- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
+- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
+- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
+The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
+### 🗣️✍️ Simba-ASR
+> **The New Standard for African Speech-to-Text**
+**🎯 Task** `Automatic Speech Recognition` — Powering high-accuracy transcription across the continent.
+**🌍 Language Coverage (43 African languages)**
+>  **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **Baoulé** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
+**🏗️ Base Architectures**
+  -  **Simba-S** (SeamlessM4T-v2-MT) — *Top Performer*
+  - **Simba-W** (Whisper-v3-large)
+  - **Simba-X** (Wav2Vec2-XLS-R-2b)
+  - **Simba-M** (MMS-1b-all)
+  - **Simba-H** (AfriHuBERT)
+🌐 Explore the Frontier
+| **ASR Models**   | **Architecture**  | **#Parameters** | **🤗 Hugging Face Model Card** | **Status** |
+|---------|:------------------:| :------------------:| :------------------:|:------------------:|
+| 🔥**Simba-S**🔥|    SeamlessM4T-v2  |  2.3B | 🤗 [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | ✅ Released |
+| 🔥**Simba-W**🔥|    Whisper         |  1.5B | 🤗 [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | ✅ Released |
+| 🔥**Simba-X**🔥|    Wav2Vec2        |  1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | ✅ Released |
+| 🔥**Simba-M**🔥|    MMS             |  1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | ✅ Released |
+| 🔥**Simba-H**🔥|    HuBERT          |  94M | 🤗 [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | ✅ Released |
+* **Simba-S** emerged as the best-performing ASR model overall.
+**🧩 Usage Example**
+You can easily run inference using the Hugging Face `transformers` library.
+```python
+from transformers import pipeline
+# Load Simba-S for ASR
+asr_pipeline = pipeline(
+    "automatic-speech-recognition",
+    model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
+)
+##### Load the multilingual African adapter (Only for  `UBC-NLP/Simba-M`)
+asr_pipeline.model.load_adapter("multilingual_african")  # Only for  `UBC-NLP/Simba-M`
+###########################
+# Transcribe audio from file
+result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
+print(result["text"])
+# Transcribe audio from audio array
+result = asr_pipeline({
+    "array": audio_array,
+    "sampling_rate": 16_000
+})
+print(result["text"])
+```
+#### Example Outputs
+Using the same audio file with different Simba models:
+```python
+# Simba-S
+{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
+```
+```python
+# Simba-W
+{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
+```
+```python
+# Simba-X
+{'text': 'fator fr on ar taamsodr is'}
+```
+```python
+# Simba-M
+{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
+```
+```python
+# Simba-H
+{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
+```
+Get started with Simba models in minutes using our interactive Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
+## Citation
+If you use the Simba models or SimbaBench  benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
+```bibtex
+@inproceedings{elmadany-etal-2025-voice,
+    title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
+    author = "Elmadany, AbdelRahim A.  and
+      Kwon, Sang Yun  and
+      Toyin, Hawau Olamide  and
+      Alcoba Inciarte, Alcides  and
+      Aldarmaki, Hanan  and
+      Abdul-Mageed, Muhammad",
+    editor = "Christodoulopoulos, Christos  and
+      Chakraborty, Tanmoy  and
+      Rose, Carolyn  and
+      Peng, Violet",
+    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
+    month = nov,
+    year = "2025",
+    address = "Suzhou, China",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.emnlp-main.559/",
+    doi = "10.18653/v1/2025.emnlp-main.559",
+    pages = "11039--11061",
+    ISBN = "979-8-89176-332-6",
+}
+```

README.md CHANGED Viewed

@@ -1,97 +1,211 @@
 ---
-license: apache-2.0
-base_model: facebook/wav2vec2-xls-r-1b
 tags:
-- automatic-speech-recognition
-- /mnt/home/elmadany/elmadany_workspace/African_speechT5/jasmine-raid/elmadany_work/HF_format/SimbaBench_tasks_ft
-- generated_from_trainer
 metrics:
-- wer
-model-index:
-- name: wav2vec2-xls-r-1b
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# wav2vec2-xls-r-1b
-This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the /MNT/HOME/ELMADANY/ELMADANY_WORKSPACE/AFRICAN_SPEECHT5/JASMINE-RAID/ELMADANY_WORK/HF_FORMAT/SIMBABENCH_TASKS_FT - ASR_FT_DATA_BATCH_2 dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.3260
-- Wer: 0.8212
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 4
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 256
-- total_eval_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 1000
-- num_epochs: 30.0
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Wer    |
-|:-------------:|:-----:|:-----:|:---------------:|:------:|
-| 4.5214        | 1.0   | 610   | 2.5037          | 0.9721 |
-| 0.5588        | 2.0   | 1221  | 2.3758          | 0.9199 |
-| 0.4547        | 3.0   | 1831  | 2.2466          | 0.9419 |
-| 0.4025        | 4.0   | 2442  | 2.2313          | 0.8876 |
-| 0.3789        | 5.0   | 3053  | 2.2444          | 0.8845 |
-| 0.3542        | 6.0   | 3663  | 1.9893          | 0.8914 |
-| 0.3383        | 7.0   | 4274  | 1.3485          | 0.8357 |
-| 0.3239        | 8.0   | 4885  | 1.9888          | 0.8776 |
-| 0.3074        | 9.0   | 5495  | 1.9462          | 0.8689 |
-| 0.2883        | 10.0  | 6106  | 1.5767          | 0.8871 |
-| 0.2749        | 11.0  | 6716  | 1.3260          | 0.8212 |
-| 0.2563        | 12.0  | 7327  | 1.8270          | 0.8440 |
-| 0.2474        | 13.0  | 7938  | 1.8219          | 0.8677 |
-| 0.2347        | 14.0  | 8548  | 1.5346          | 0.8635 |
-| 0.2211        | 15.0  | 9159  | 1.7185          | 0.8636 |
-| 0.2117        | 16.0  | 9770  | 1.8663          | 0.8697 |
-| 0.1987        | 17.0  | 10380 | 1.4298          | 0.8687 |
-| 0.1814        | 18.0  | 10991 | 1.5630          | 0.8680 |
-| 0.1694        | 19.0  | 11601 | 1.3627          | 0.8573 |
-| 0.1597        | 20.0  | 12212 | 1.7108          | 0.8642 |
-| 0.1517        | 21.0  | 12823 | 1.8344          | 0.8794 |
-| 0.1405        | 22.0  | 13433 | 1.4838          | 0.8508 |
-| 0.1262        | 23.0  | 14044 | 1.5322          | 0.8415 |
-| 0.1171        | 24.0  | 14655 | 1.7095          | 0.8682 |
-| 0.1079        | 25.0  | 15265 | 1.7445          | 0.8719 |
-| 0.0996        | 26.0  | 15876 | 1.7322          | 0.8502 |
-| 0.0922        | 27.0  | 16486 | 1.8349          | 0.8625 |
-| 0.0855        | 28.0  | 17097 | 1.8259          | 0.8646 |
-| 0.081         | 29.0  | 17708 | 1.8187          | 0.8651 |
-| 0.0771        | 29.97 | 18300 | 1.8427          | 0.8624 |
-### Framework versions
-- Transformers 4.33.2
-- Pytorch 2.0.1+cu117
-- Datasets 3.5.0
-- Tokenizers 0.13.3

 ---
+language:
+  - am  # Amharic
+  - ar  # Arabic
+  - tw  # Asante Twi
+  - bm  # Bambara
+  - fr  # French
+  - lg  # Ganda
+  - ha  # Hausa
+  - ig  # Igbo
+  - rw  # Kinyarwanda
+  - kg  # Kongo
+  - ln  # Lingala
+  - lu  # Luba-Katanga
+  - mg  # Malagasy
+  - nso # Northern Sotho
+  - ny  # Nyanja
+  - om  # Oromo
+  - pt  # Portuguese
+  - sn  # Shona
+  - so  # Somali
+  - st  # Southern Sotho
+  - sw  # Swahili
+  - ss  # Swati
+  - ti  # Tigrinya
+  - ts  # Tsonga
+  - tn  # Tswana
+  - ak  # Twi
+  - ve  # Venda
+  - wo  # Wolof
+  - xh  # Xhosa
+  - yo  # Yoruba
+  - zu  # Zulu
+  - tzm # Tamazight
+  - sg  # Sango
+  - din # Dinka
+  - ee  # Ewe
+  - fo  # Fon
+  - luo # Luo
+  - mos # Mossi
+  - umb # Umbundu
+license: cc-by-4.0
 tags:
+  - automatic-speech-recognition
+  - audio
+  - speech
+  - african-languages
+  - multilingual
+  - simba
+  - low-resource
+  - speech-recognition
+  - asr
+datasets:
+  - UBC-NLP/SimbaBench
 metrics:
+  - wer
+  - cer
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
 ---
+<div align="center">
+<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
+[![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
+[![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
+[![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](#simbabench)
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=black&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
+[![YouTube Video](https://img.shields.io/badge/YouTube-Video-FF0000?style=for-the-badge&logo=youtube&logoColor=FF0000&labelColor=FFCCBC)](#demo)
+</div>
+## *Bridging the Digital Divide for African AI*
+**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
+## Best-in-Class Multilingual Models
+Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
+- **Unified Suite:** Models optimized for African languages.
+- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
+- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
+- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
+The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
+### 🗣️✍️ Simba-ASR
+> **The New Standard for African Speech-to-Text**
+**🎯 Task** `Automatic Speech Recognition` — Powering high-accuracy transcription across the continent.
+**🌍 Language Coverage (43 African languages)**
+>  **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **Baoulé** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
+**🏗️ Base Architectures**
+  -  **Simba-S** (SeamlessM4T-v2-MT) — *Top Performer*
+  - **Simba-W** (Whisper-v3-large)
+  - **Simba-X** (Wav2Vec2-XLS-R-2b)
+  - **Simba-M** (MMS-1b-all)
+  - **Simba-H** (AfriHuBERT)
+🌐 Explore the Frontier
+| **ASR Models**   | **Architecture**  | **#Parameters** | **🤗 Hugging Face Model Card** | **Status** |
+|---------|:------------------:| :------------------:| :------------------:|:------------------:|
+| 🔥**Simba-S**🔥|    SeamlessM4T-v2  |  2.3B | 🤗 [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | ✅ Released |
+| 🔥**Simba-W**🔥|    Whisper         |  1.5B | 🤗 [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | ✅ Released |
+| 🔥**Simba-X**🔥|    Wav2Vec2        |  1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | ✅ Released |
+| 🔥**Simba-M**🔥|    MMS             |  1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | ✅ Released |
+| 🔥**Simba-H**🔥|    HuBERT          |  94M | 🤗 [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | ✅ Released |
+* **Simba-S** emerged as the best-performing ASR model overall.
+**🧩 Usage Example**
+You can easily run inference using the Hugging Face `transformers` library.
+```python
+from transformers import pipeline
+# Load Simba-S for ASR
+asr_pipeline = pipeline(
+    "automatic-speech-recognition",
+    model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
+)
+##### Load the multilingual African adapter (Only for  `UBC-NLP/Simba-M`)
+asr_pipeline.model.load_adapter("multilingual_african")  # Only for  `UBC-NLP/Simba-M`
+###########################
+# Transcribe audio from file
+result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
+print(result["text"])
+# Transcribe audio from audio array
+result = asr_pipeline({
+    "array": audio_array,
+    "sampling_rate": 16_000
+})
+print(result["text"])
+```
+#### Example Outputs
+Using the same audio file with different Simba models:
+```python
+# Simba-S
+{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
+```
+```python
+# Simba-W
+{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
+```
+```python
+# Simba-X
+{'text': 'fator fr on ar taamsodr is'}
+```
+```python
+# Simba-M
+{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
+```
+```python
+# Simba-H
+{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
+```
+Get started with Simba models in minutes using our interactive Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
+## Citation
+If you use the Simba models or SimbaBench  benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
+```bibtex
+@inproceedings{elmadany-etal-2025-voice,
+    title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
+    author = "Elmadany, AbdelRahim A.  and
+      Kwon, Sang Yun  and
+      Toyin, Hawau Olamide  and
+      Alcoba Inciarte, Alcides  and
+      Aldarmaki, Hanan  and
+      Abdul-Mageed, Muhammad",
+    editor = "Christodoulopoulos, Christos  and
+      Chakraborty, Tanmoy  and
+      Rose, Carolyn  and
+      Peng, Violet",
+    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
+    month = nov,
+    year = "2025",
+    address = "Suzhou, China",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.emnlp-main.559/",
+    doi = "10.18653/v1/2025.emnlp-main.559",
+    pages = "11039--11061",
+    ISBN = "979-8-89176-332-6",
+}
+```