Update README.md

88839ed verified 2 months ago

5.04 kB

language:
  - fon
license: cc-by-nc-4.0
base_model: facebook/mms-300m
tags:
  - automatic-speech-recognition
  - mms
  - fongbe
  - african-languages
  - low-resource-languages
  - tone-preserved
  - audio
datasets:
  - Professor/fongbe-speech-zenodo
metrics:
  - wer
  - cer
library_name: transformers
pipeline_tag: automatic-speech-recognition
model-index:
  - name: mms-300m-fongbe
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Fongbe Speech Zenodo (ALFFA + Zenodo)
          type: Professor/fongbe-speech-zenodo
          config: default
          split: test
        metrics:
          - name: Test WER
            type: wer
            value: 0.0948
          - name: Test CER
            type: cer
            value: 0.0396

mms-300m-fongbe

This model is a fine-tuned version of facebook/mms-300m specifically for Fongbe (Fon), a tonal language primarily spoken in Benin.

It was developed to preserve linguistic integrity by maintaining critical tonal diacritics and unique orthographic characters (e.g., ɖ, ɛ, ɔ, è, é). This model achieves State-of-the-Art (SOTA) results for Fongbe Automatic Speech Recognition (ASR) on the ALFFA test benchmark.

📊 Evaluation Results

The model was evaluated on the held-out ALFFA test set (2,168 utterances):

Metric	Score
WER (Word Error Rate)	0.0948 (9.48%)
CER (Character Error Rate)	0.0396 (3.96%)

Benchmark Comparison (with diacritics)

Model	WER (%)	CER (%)	Year
Laleye et al. (Baseline)	44.04%	—	2016
MMS-300m-Fongbe (Ours)	9.48%	3.96%	2026

Inference Examples

Reference	Prediction	Result
gannu elɔ kpɔ hu ɖe ɔ	gannu elɔ kpɔ hu ɖe ɔ	✅ Perfect
ɖɔla tεnwe	ɖɔla tεnwe	✅ Perfect
ama e gbɔ mɔ ɖo nɔ ɔ nu e wε e nɔ ɖu	ama e gbɔ mɔ ɖo nɔ ɔ nu ɔ e nɔ ɖu	⚠️ Minor error

📖 Model Description

Architecture: MMS (Massive Multilingual Speech) 300M parameter model.
Methodology: Fine-tuned with Connectionist Temporal Classification (CTC) loss.
Language: Fongbe (fon).
Phonetic Representation: Tone-preserved orthography using NFD/NFC normalization.
Special Features: Full support for Fon-specific characters (ɖ, ɛ, ɔ) and tone markers.

🚀 How to Use

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="Professor/mms-300m-fongbe")

# Ensure your audio is 16kHz
transcription = asr("path_to_audio.wav")
print(transcription["text"])

🎯 Intended Uses & Limitations

Intended Uses

High-accuracy transcription of Fongbe speech.
Research in low-resource and tonal language modeling.
Base model for downstream Fongbe NLP tasks (NLP4Fon).

Limitations

Performance may degrade in noisy environments or with heavy background music.
Primarily trained on continuous speech; may require further fine-tuning for specific dialects or extremely fast colloquial speech.

📁 Training and Evaluation Data

The model was trained on a consolidated dataset merging the ALFFA Project (African Languages in the Field) data and the Zenodo Fongbe Speech Dataset:

Train + Validation Set: ~10.85 hours (Merged and re-split 90/10).
Test Set: ~1.45 hours (Standard 2,168 utterances from ALFFA for benchmark consistency).
Sampling Rate: 16,000 Hz.

⚙️ Training Procedure

Hyperparameters

Learning Rate: 1e-4
Effective Batch Size: 64 (Batch 16 x 4 Grad Accumulation)
Optimizer: AdamW (Fused)
Epochs: 30
Precision: Mixed Precision (FP16)
Hardware: NVIDIA H100 GPU

Training Logs

Training Loss	Epoch	Step	Validation Loss	WER
26.3861	3.11	500	1.0171	0.6021
2.5796	6.21	1000	0.3366	0.2600
1.3316	9.32	1500	0.2312	0.1799
0.9087	12.42	2000	0.2031	0.1557
0.6678	15.53	2500	0.1752	0.1397
0.5069	18.64	3000	0.1747	0.1325
0.4034	21.74	3500	0.1583	0.1137
0.3142	24.85	4000	0.1618	0.1147
0.2622	27.95	4500	0.1656	0.1085

📜 Citation & Credits

If you use this model in your research, please cite the following:

Dataset Contributors: Laleye, Fréjus A. A., et al. (ALFFA Project & Zenodo release).

Model Developer: Victor Olufemi (Professor).

@dataset{laleye_frejus_2022_6604637,
  author       = {Laleye, Fréjus A. A.},
  title        = {Fongbe Speech Dataset},
  year         = 2022,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.6604637}
}

@inproceedings{laleye2016FongbeASR,       
    title={First Automatic Fongbe Continuous Speech Recognition System},     
    author={A. A Laleye, Fréjus and Besacier, Laurent and Ezin, Eugène C. and Motamed, Cina},     
    year={2016},     
    organization={FedCSIS}
}