You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MALIBA-AI/bambara-asr-v3

Bambara automatic speech recognition model. Built on OpenAI's openai/whisper-large-v3.

Currently ranked #1 on the Bambara ASR Benchmark Leaderboard, making it the best publicly available Bambara ASR model as of January 2026.

⚠️ Non-commercial use only. One of the training sources (Bible recordings) carries a license that restricts commercial use. See License.

Evaluation

Internal Test Set

Evaluated on oza75/bambara-asr (clean-combined split, 2,088 samples), 3h duration.

Condition	WER (%)	CER (%)
Raw (no normalization)	17.99	8.08
Minimal normalization	16.94	7.74
Normalized (expand mode)	13.23	6.89
Normalized (contract mode)	13.79	6.89

Normalization applied using bambara-text-normalization with the for_wer_evaluation() preset. Expand mode handles contraction disambiguation (e.g., k'a → ka a) while contract mode collapses expanded forms (e.g., bɛ a → b'a). See the normalizer documentation for details on contraction modes.

Bambara ASR Leaderboard

Evaluated on the MALIBA-AI/bambara-asr-benchmark — 1 hour of studio-recorded Malian constitutional text (pure bambara), validated by linguists from Mali's DNENF-LN.

Metric	Score
WER	45.73%
CER	13.45%
Combined (0.5 × WER + 0.5 × CER)	29.59%
Rank	🏆 1st / 37 models

See the Leaderboard

What Changed from v1

	v1	v3
Base model	whisper-large-v2	whisper-large-v3
Training data	jeli-asr + Mali-Pense	jeli-asr + Mali-Pense + Bible + Common Voice (fr/en)
Benchmark WER	61.74%	45.73%
Benchmark CER	17.90%	13.45%
License	Apache 2.0	CC-BY-NC-4.0 (non-commercial)

The ~16 point WER improvement comes from both the stronger base model and broader, more diverse training data.

Training Data

Source	Language	Description
RobotsMali/jeli-asr	Bambara	Conversational and read speech
Mali-Pense	Bambara	Transcribed audio from the Mali-Pense linguistic project
Bible recordings	Bambara	Narrated Bible text formal register. Non-commercial license
Common Voice	French, English	Supplementary multilingual data

The mix of conversational (jeli-asr), formal/literary (Bible, Mali-Pense), and multilingual (Common Voice) data gives broader coverage than previous Bambara-only systems. The French and English data helps the model handle code-switching, which is common in everyday Bambara speech in Mali.

Training

Base model: openai/whisper-large-v3
Method: LoRA (Low-Rank Adaptation) via PEFT

Usage

pip install git+https://github.com/sudoping01/whosper.git

from whosper import WhosperTranscriber

transcriber = WhosperTranscriber(model_id="MALIBA-AI/bambara-asr-v3")

result = transcriber.transcribe_audio("path/to/audio.wav")
print(result)

Intended Use

For:

Research on Bambara and low-resource African language ASR
Benchmarking and comparison of ASR systems
Transcription assistance
Educational and non-profit applications
Linguistic analysis and documentation

Not for:

Commercial products or services (license restriction from Bible training data)
Medical, legal, or safety-critical transcription

License

CC-BY-NC-4.0 — non-commercial use only.

This restriction exists because the Bible audio used in training does not permit commercial use. If you need a commercially-licensed Bambara ASR model, see MALIBA-AI/bambara-asr-v1 (Apache 2.0, lower accuracy).

Citation

@misc{maliba_asr_v3,
  author       = {{MALIBA-AI}},
  title        = {MALIBA-AI Bambara ASR v3},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/MALIBA-AI/bambara-asr-v3}}
}

If reporting benchmark results, please also cite:

@misc{BambaraASRBenchmark2025,
  title        = {Where Are We at with Automatic Speech Recognition for the Bambara Language?},
  author       = {Seydou Diallo and Yacouba Diarra and Mamadou K. Keita and Panga Azazia Kamat{\'e} and Adam Bouno Kampo and Aboubacar Ouattara},
  year         = {2025},
  howpublished = {Hugging Face Datasets},
  url          = {https://huggingface.co/datasets/MALIBA-AI/bambara-asr-benchmark}
}

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

F16

Model tree for MALIBA-AI/bambara-asr-v3

Base model

openai/whisper-large-v3

Adapter

(207)

this model

Evaluation results

WER (raw) on oza75/bambara-asr
test set self-reported

17.990
CER (raw) on oza75/bambara-asr
test set self-reported

8.080
WER (normalized) on oza75/bambara-asr
test set self-reported

13.230
CER (normalized) on oza75/bambara-asr
test set self-reported

6.890
WER on Bambara ASR Benchmark (Constitution)
test set self-reported

45.730
CER on Bambara ASR Benchmark (Constitution)
test set self-reported

13.450