Silas Chalwe
Update README.md
c29c093 verified
<p align="center">
<img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="80" />
</p>
# 🫒 NextInnoMind / next\_bemba\_ai\_medium
**Multilingual Whisper ASR (Automatic Speech Recognition)**
Fine-tuned Whisper model for Bemba and English using language tokens.
Developed and maintained by **NextInnoMind**, led by **Chalwe Silas**.
---
### πŸ§ͺ Model Type
`WhisperForConditionalGeneration` β€” fine-tuned using [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
Framework: `Transformers`
Checkpoint Format: `Safetensors`
Languages: `Bemba`, `English` (with `<|bem|>` language token support)
---
## πŸ“œ Model Description
This model is a Whisper Medium variant fine-tuned for **Bemba** and **English**, enabling robust multilingual transcription. It supports the use of language tokens (e.g., `<|bem|>`) to help guide decoding, particularly for low-resource languages like Bemba.
---
## πŸ“š Training Details
* **Base Model**: [`openai/whisper-medium`](https://huggingface.co/openai/whisper-medium)
* **Dataset**:
* BembaSpeech (curated dataset of Bemba audio + transcripts)
* English subset of [Common Voice](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0)
* **Training Time**: 8 epochs (\~55 hours on A100 GPU)
* **Learning Rate**: 1e-5
* **Batch Size**: 16
* **Framework**: Transformers + Accelerate
* **Tokenizer**: WhisperProcessor with `language="<|bem|>"` and `task="transcribe"`
---
## πŸš€ Usage
```python
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="NextInnoMind/next_bemba_ai_medium",
chunk_length_s=30,
return_timestamps=True
)
# Example
result = pipe("path_to_audio.wav")
print(result["text"])
```
> πŸ“Œ Tip: For Bemba, use the language token `<|bem|>` to improve transcription accuracy.
---
## πŸ” Applications
* **Multilingual Education**: Bemba-English subtitles and transcription
* **Broadcast & Media**: Transcribe bilingual radio or TV content
* **Research**: Language preservation and Bantu-English linguistic studies
* **Voice Accessibility**: Multilingual ASR tools and captioning
---
## ⚠️ Limitations & Biases
* Slight performance drop with highly noisy or code-switched audio
* Trained on formal and clean speech; informal speech may lower accuracy
* `<|bem|>` is required for optimal Bemba decoding
---
## πŸ“Š Evaluation
| Language | WER (Word Error Rate) | Dataset |
| -------- | --------------------- | -------------------- |
| Bemba | \~15.2% | BembaSpeech Eval Set |
| English | \~10.5% | Common Voice EN |
---
## 🌱 Environmental Impact
* **Hardware**: A100 40GB x1
* **Training Time**: \~55 hours
* **Carbon Emissions**: Estimated \~25.8 kg COβ‚‚
*(via [ML CO2 Impact](https://mlco2.github.io/impact))*
---
## πŸ“„ Citation
```bibtex
@misc{nextbembaai2025,
title={NextInnoMind next_bemba_ai_medium: Multilingual Whisper ASR model for Bemba and English},
author={Silas Chalwe and NextInnoMind},
year={2025},
howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai_medium}},
}
```
---
## πŸ§‘β€πŸ’» Maintainers
* **Chalwe Silas** (Lead Developer & Dataset Curator)
* Team **NextInnoMind**
πŸ“¬ Contact:
* [silaschalwe@outlook.com](mailto:silaschalwe@outlook.com)
* [mchalwesilas@gmail.com](mailto:mchalwesilas@gmail.com)
πŸ”— GitHub: [SilasChalwe](https://github.com/SilasChalwe)
---
## πŸ“Œ Related Resources
* [BembaSpeech Dataset](https://huggingface.co/datasets/NextInnoMind/BembaSpeech)
* [NextInnoMind on GitHub](https://github.com/SilasChalwe)
---
Fine tuned in Zambia.