File size: 3,672 Bytes
c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 89b651c c29c093 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | <p align="center">
<img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="80" />
</p>
# π«’ NextInnoMind / next\_bemba\_ai\_medium
**Multilingual Whisper ASR (Automatic Speech Recognition)**
Fine-tuned Whisper model for Bemba and English using language tokens.
Developed and maintained by **NextInnoMind**, led by **Chalwe Silas**.
---
### π§ͺ Model Type
`WhisperForConditionalGeneration` β fine-tuned using [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
Framework: `Transformers`
Checkpoint Format: `Safetensors`
Languages: `Bemba`, `English` (with `<|bem|>` language token support)
---
## π Model Description
This model is a Whisper Medium variant fine-tuned for **Bemba** and **English**, enabling robust multilingual transcription. It supports the use of language tokens (e.g., `<|bem|>`) to help guide decoding, particularly for low-resource languages like Bemba.
---
## π Training Details
* **Base Model**: [`openai/whisper-medium`](https://huggingface.co/openai/whisper-medium)
* **Dataset**:
* BembaSpeech (curated dataset of Bemba audio + transcripts)
* English subset of [Common Voice](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0)
* **Training Time**: 8 epochs (\~55 hours on A100 GPU)
* **Learning Rate**: 1e-5
* **Batch Size**: 16
* **Framework**: Transformers + Accelerate
* **Tokenizer**: WhisperProcessor with `language="<|bem|>"` and `task="transcribe"`
---
## π Usage
```python
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="NextInnoMind/next_bemba_ai_medium",
chunk_length_s=30,
return_timestamps=True
)
# Example
result = pipe("path_to_audio.wav")
print(result["text"])
```
> π Tip: For Bemba, use the language token `<|bem|>` to improve transcription accuracy.
---
## π Applications
* **Multilingual Education**: Bemba-English subtitles and transcription
* **Broadcast & Media**: Transcribe bilingual radio or TV content
* **Research**: Language preservation and Bantu-English linguistic studies
* **Voice Accessibility**: Multilingual ASR tools and captioning
---
## β οΈ Limitations & Biases
* Slight performance drop with highly noisy or code-switched audio
* Trained on formal and clean speech; informal speech may lower accuracy
* `<|bem|>` is required for optimal Bemba decoding
---
## π Evaluation
| Language | WER (Word Error Rate) | Dataset |
| -------- | --------------------- | -------------------- |
| Bemba | \~15.2% | BembaSpeech Eval Set |
| English | \~10.5% | Common Voice EN |
---
## π± Environmental Impact
* **Hardware**: A100 40GB x1
* **Training Time**: \~55 hours
* **Carbon Emissions**: Estimated \~25.8 kg COβ
*(via [ML CO2 Impact](https://mlco2.github.io/impact))*
---
## π Citation
```bibtex
@misc{nextbembaai2025,
title={NextInnoMind next_bemba_ai_medium: Multilingual Whisper ASR model for Bemba and English},
author={Silas Chalwe and NextInnoMind},
year={2025},
howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai_medium}},
}
```
---
## π§βπ» Maintainers
* **Chalwe Silas** (Lead Developer & Dataset Curator)
* Team **NextInnoMind**
π¬ Contact:
* [silaschalwe@outlook.com](mailto:silaschalwe@outlook.com)
* [mchalwesilas@gmail.com](mailto:mchalwesilas@gmail.com)
π GitHub: [SilasChalwe](https://github.com/SilasChalwe)
---
## π Related Resources
* [BembaSpeech Dataset](https://huggingface.co/datasets/NextInnoMind/BembaSpeech)
* [NextInnoMind on GitHub](https://github.com/SilasChalwe)
---
Fine tuned in Zambia.
|