You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model architecture WER CER Language License

MALIBA-AI/bambara-asr-v3

Bambara automatic speech recognition model. Built on OpenAI's openai/whisper-large-v3.

Currently ranked #1 on the Bambara ASR Benchmark Leaderboard, making it the best publicly available Bambara ASR model as of January 2026.

⚠️ Non-commercial use only. One of the training sources (Bible recordings) carries a license that restricts commercial use. See License.

Evaluation

Internal Test Set

Evaluated on oza75/bambara-asr (clean-combined split, 2,088 samples), 3h duration.

Condition WER (%) CER (%)
Raw (no normalization) 17.99 8.08
Minimal normalization 16.94 7.74
Normalized (expand mode) 13.23 6.89
Normalized (contract mode) 13.79 6.89

Normalization applied using bambara-text-normalization with the for_wer_evaluation() preset. Expand mode handles contraction disambiguation (e.g., k'aka a) while contract mode collapses expanded forms (e.g., bɛ ab'a). See the normalizer documentation for details on contraction modes.

Normalization impact

Bambara ASR Leaderboard

Evaluated on the MALIBA-AI/bambara-asr-benchmark — 1 hour of studio-recorded Malian constitutional text (pure bambara), validated by linguists from Mali's DNENF-LN.

Metric Score
WER 45.73%
CER 13.45%
Combined (0.5 × WER + 0.5 × CER) 29.59%
Rank 🏆 1st / 37 models

See the Leaderboard

What Changed from v1

v1 v3
Base model whisper-large-v2 whisper-large-v3
Training data jeli-asr + Mali-Pense jeli-asr + Mali-Pense + Bible + Common Voice (fr/en)
Benchmark WER 61.74% 45.73%
Benchmark CER 17.90% 13.45%
License Apache 2.0 CC-BY-NC-4.0 (non-commercial)

The ~16 point WER improvement comes from both the stronger base model and broader, more diverse training data.

v1 vs v3 comparison

Training Data

Source Language Description
RobotsMali/jeli-asr Bambara Conversational and read speech
Mali-Pense Bambara Transcribed audio from the Mali-Pense linguistic project
Bible recordings Bambara Narrated Bible text formal register. Non-commercial license
Common Voice French, English Supplementary multilingual data

The mix of conversational (jeli-asr), formal/literary (Bible, Mali-Pense), and multilingual (Common Voice) data gives broader coverage than previous Bambara-only systems. The French and English data helps the model handle code-switching, which is common in everyday Bambara speech in Mali.

Training

  • Base model: openai/whisper-large-v3
  • Method: LoRA (Low-Rank Adaptation) via PEFT

Usage

pip install git+https://github.com/sudoping01/whosper.git
from whosper import WhosperTranscriber

transcriber = WhosperTranscriber(model_id="MALIBA-AI/bambara-asr-v3")

result = transcriber.transcribe_audio("path/to/audio.wav")
print(result)

Intended Use

For:

  • Research on Bambara and low-resource African language ASR
  • Benchmarking and comparison of ASR systems
  • Transcription assistance
  • Educational and non-profit applications
  • Linguistic analysis and documentation

Not for:

  • Commercial products or services (license restriction from Bible training data)
  • Medical, legal, or safety-critical transcription

License

CC-BY-NC-4.0 — non-commercial use only.

This restriction exists because the Bible audio used in training does not permit commercial use. If you need a commercially-licensed Bambara ASR model, see MALIBA-AI/bambara-asr-v1 (Apache 2.0, lower accuracy).

Citation

@misc{maliba_asr_v3,
  author       = {{MALIBA-AI}},
  title        = {MALIBA-AI Bambara ASR v3},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/MALIBA-AI/bambara-asr-v3}}
}

If reporting benchmark results, please also cite:

@misc{BambaraASRBenchmark2025,
  title        = {Where Are We at with Automatic Speech Recognition for the Bambara Language?},
  author       = {Seydou Diallo and Yacouba Diarra and Mamadou K. Keita and Panga Azazia Kamat{\'e} and Adam Bouno Kampo and Aboubacar Ouattara},
  year         = {2025},
  howpublished = {Hugging Face Datasets},
  url          = {https://huggingface.co/datasets/MALIBA-AI/bambara-asr-benchmark}
}
Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MALIBA-AI/bambara-asr-v3

Adapter
(181)
this model

Evaluation results