mms-300m-fongbe
This model is a fine-tuned version of facebook/mms-300m specifically for Fongbe (Fon), a tonal language primarily spoken in Benin.
It was developed to preserve linguistic integrity by maintaining critical tonal diacritics and unique orthographic characters (e.g., Ι, Ι, Ι, Γ¨, Γ©). This model achieves State-of-the-Art (SOTA) results for Fongbe Automatic Speech Recognition (ASR) on the ALFFA test benchmark.
π Evaluation Results
The model was evaluated on the held-out ALFFA test set (2,168 utterances):
| Metric | Score |
|---|---|
| WER (Word Error Rate) | 0.0948 (9.48%) |
| CER (Character Error Rate) | 0.0396 (3.96%) |
Benchmark Comparison (with diacritics)
| Model | WER (%) | CER (%) | Year |
|---|---|---|---|
| Laleye et al. (Baseline) | 44.04% | β | 2016 |
| MMS-300m-Fongbe (Ours) | 9.48% | 3.96% | 2026 |
Inference Examples
| Reference | Prediction | Result |
|---|---|---|
| gannu elΙ kpΙ hu Ιe Ι | gannu elΙ kpΙ hu Ιe Ι | β Perfect |
| ΙΙla tΞ΅nwe | ΙΙla tΞ΅nwe | β Perfect |
| ama e gbΙ mΙ Ιo nΙ Ι nu e wΞ΅ e nΙ Ιu | ama e gbΙ mΙ Ιo nΙ Ι nu Ι e nΙ Ιu | β οΈ Minor error |
π Model Description
- Architecture: MMS (Massive Multilingual Speech) 300M parameter model.
- Methodology: Fine-tuned with Connectionist Temporal Classification (CTC) loss.
- Language: Fongbe (fon).
- Phonetic Representation: Tone-preserved orthography using NFD/NFC normalization.
- Special Features: Full support for Fon-specific characters (
Ι,Ι,Ι) and tone markers.
π How to Use
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="Professor/mms-300m-fongbe")
# Ensure your audio is 16kHz
transcription = asr("path_to_audio.wav")
print(transcription["text"])
π― Intended Uses & Limitations
Intended Uses
- High-accuracy transcription of Fongbe speech.
- Research in low-resource and tonal language modeling.
- Base model for downstream Fongbe NLP tasks (NLP4Fon).
Limitations
- Performance may degrade in noisy environments or with heavy background music.
- Primarily trained on continuous speech; may require further fine-tuning for specific dialects or extremely fast colloquial speech.
π Training and Evaluation Data
The model was trained on a consolidated dataset merging the ALFFA Project (African Languages in the Field) data and the Zenodo Fongbe Speech Dataset:
- Train + Validation Set: ~10.85 hours (Merged and re-split 90/10).
- Test Set: ~1.45 hours (Standard 2,168 utterances from ALFFA for benchmark consistency).
- Sampling Rate: 16,000 Hz.
βοΈ Training Procedure
Hyperparameters
- Learning Rate: 1e-4
- Effective Batch Size: 64 (Batch 16 x 4 Grad Accumulation)
- Optimizer: AdamW (Fused)
- Epochs: 30
- Precision: Mixed Precision (FP16)
- Hardware: NVIDIA H100 GPU
Training Logs
| Training Loss | Epoch | Step | Validation Loss | WER |
|---|---|---|---|---|
| 26.3861 | 3.11 | 500 | 1.0171 | 0.6021 |
| 2.5796 | 6.21 | 1000 | 0.3366 | 0.2600 |
| 1.3316 | 9.32 | 1500 | 0.2312 | 0.1799 |
| 0.9087 | 12.42 | 2000 | 0.2031 | 0.1557 |
| 0.6678 | 15.53 | 2500 | 0.1752 | 0.1397 |
| 0.5069 | 18.64 | 3000 | 0.1747 | 0.1325 |
| 0.4034 | 21.74 | 3500 | 0.1583 | 0.1137 |
| 0.3142 | 24.85 | 4000 | 0.1618 | 0.1147 |
| 0.2622 | 27.95 | 4500 | 0.1656 | 0.1085 |
π Citation & Credits
If you use this model in your research, please cite the following:
Dataset Contributors: Laleye, FrΓ©jus A. A., et al. (ALFFA Project & Zenodo release).
Model Developer: Victor Olufemi (Professor).
@dataset{laleye_frejus_2022_6604637,
author = {Laleye, FrΓ©jus A. A.},
title = {Fongbe Speech Dataset},
year = 2022,
publisher = {Zenodo},
doi = {10.5281/zenodo.6604637}
}
@inproceedings{laleye2016FongbeASR,
title={First Automatic Fongbe Continuous Speech Recognition System},
author={A. A Laleye, Fréjus and Besacier, Laurent and Ezin, Eugène C. and Motamed, Cina},
year={2016},
organization={FedCSIS}
}
- Downloads last month
- 85
Model tree for Professor/mms-300m-fongbe
Base model
facebook/mms-300mDataset used to train Professor/mms-300m-fongbe
Evaluation results
- Test WER on Fongbe Speech Zenodo (ALFFA + Zenodo)test set self-reported0.095
- Test CER on Fongbe Speech Zenodo (ALFFA + Zenodo)test set self-reported0.040