| ---
|
| library_name: transformers
|
| pipeline_tag: text-to-audio
|
| ---
|
| <p align="center"> |
| <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="80" /> |
| </p> |
|
|
| # π«’ NextInnoMind / next\_bemba\_ai |
|
|
| **Bemba Whisper ASR (Automatic Speech Recognition)** |
| Fine-tuned Whisper model for the Bemba language only. |
| Developed and maintained by **NextInnoMind**, led by **Chalwe Silas**. |
|
|
| --- |
|
|
| ### π§ͺ Model Type |
|
|
| `WhisperForConditionalGeneration` β fine-tuned using [openai/whisper-small](https://huggingface.co/openai/whisper-small) |
| Framework: `Transformers` |
| Checkpoint Format: `Safetensors` |
| Languages: `Bemba` |
|
|
| --- |
|
|
| ## π Model Description |
|
|
| This model is a Whisper Small variant fine-tuned exclusively for **Bemba**, a major Zambian language. It is designed to enhance local language ASR performance and promote indigenous language technology. |
|
|
| --- |
|
|
| ## π Training Details |
|
|
| * **Base Model**: [`openai/whisper-small`](https://huggingface.co/openai/whisper-small) |
| * **Dataset**: |
|
|
| * BembaSpeech (curated dataset of Bemba audio + transcripts) |
| * **Training Time**: 8 epochs (\~45 hours on A100 GPU) |
| * **Learning Rate**: 1e-5 |
| * **Batch Size**: 16 |
| * **Framework**: Transformers + Accelerate |
| * **Tokenizer**: WhisperProcessor with `task="transcribe"` (no language token used) |
|
|
| --- |
|
|
| ## π Usage |
|
|
| ```python |
| from transformers import pipeline |
| |
| pipe = pipeline( |
| "automatic-speech-recognition", |
| model="NextInnoMind/next_bemba_ai", |
| chunk_length_s=30, |
| return_timestamps=True |
| ) |
| |
| # Example |
| result = pipe("path_to_audio.wav") |
| print(result["text"]) |
| ``` |
|
|
| > π Tip: No language token is required. The model is fine-tuned for Bemba only. |
|
|
| --- |
|
|
| ## π Applications |
|
|
| * **Education**: Local language transcriptions and learning tools |
| * **Broadcast & Media**: Transcribe Bemba radio and TV shows |
| * **Research**: Bantu language documentation and analysis |
| * **Accessibility**: Voice-to-text systems in local apps and platforms |
|
|
| --- |
|
|
| ## β οΈ Limitations & Biases |
|
|
| * Trained only on Bemba: does not support English or other languages. |
| * Accuracy may drop with heavy background noise or strong dialectal variation. |
| * Not optimized for code-switching or informal speech styles. |
|
|
| --- |
|
|
| ## π Evaluation |
|
|
| | Language | WER (Word Error Rate) | Dataset | |
| | -------- | --------------------- | -------------------- | |
| | Bemba | \~16.7% | BembaSpeech Eval Set | |
|
|
| --- |
|
|
| ## π± Environmental Impact |
|
|
| * **Hardware**: A100 40GB x1 |
| * **Training Time**: \~45 hours |
| * **Carbon Emissions**: Estimated \~20.4 kg COβ |
| *(via [ML CO2 Impact](https://mlco2.github.io/impact))* |
|
|
| --- |
|
|
| ## π Citation |
|
|
| ```bibtex |
| @misc{nextbembaai2025, |
| title={NextInnoMind next_bemba_ai: Whisper-based ASR model for Bemba}, |
| author={Silas Chalwe and NextInnoMind}, |
| year={2025}, |
| howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai}}, |
| } |
| ``` |
|
|
| --- |
|
|
| ## π§βπ» Maintainers |
|
|
| * **Chalwe Silas** (Lead Developer & Dataset Curator) |
| * Team **NextInnoMind** |
|
|
| π¬ Contact: |
|
|
| * [silaschalwe@outlook.com](mailto:silaschalwe@outlook.com) |
| * [mchalwesilas@gmail.com](mailto:mchalwesilas@gmail.com) |
|
|
| π GitHub: [SilasChalwe](https://github.com/SilasChalwe) |
|
|
| --- |
|
|
| ## π Related Resources |
|
|
| * [BembaSpeech Dataset](https://huggingface.co/datasets/NextInnoMind/BembaSpeech) |
| * [NextInnoMind on GitHub](https://github.com/SilasChalwe) |
|
|
| --- |
|
|
| Fine tuned in Zambia. |