---
library_name: transformers
pipeline_tag: text-to-audio
---
<p align="center">
  <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="80" />
</p>

# 🫢 NextInnoMind / next\_bemba\_ai

**Bemba Whisper ASR (Automatic Speech Recognition)**
Fine-tuned Whisper model for the Bemba language only.
Developed and maintained by **NextInnoMind**, led by **Chalwe Silas**.

---

### 🧪 Model Type

`WhisperForConditionalGeneration` — fine-tuned using [openai/whisper-small](https://huggingface.co/openai/whisper-small)
Framework: `Transformers`
Checkpoint Format: `Safetensors`
Languages: `Bemba`

---

## 📜 Model Description

This model is a Whisper Small variant fine-tuned exclusively for **Bemba**, a major Zambian language. It is designed to enhance local language ASR performance and promote indigenous language technology.

---

## 📚 Training Details

* **Base Model**: [`openai/whisper-small`](https://huggingface.co/openai/whisper-small)
* **Dataset**:

  * BembaSpeech (curated dataset of Bemba audio + transcripts)
* **Training Time**: 8 epochs (\~45 hours on A100 GPU)
* **Learning Rate**: 1e-5
* **Batch Size**: 16
* **Framework**: Transformers + Accelerate
* **Tokenizer**: WhisperProcessor with `task="transcribe"` (no language token used)

---

## 🚀 Usage

```python
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="NextInnoMind/next_bemba_ai",
    chunk_length_s=30,
    return_timestamps=True
)

# Example
result = pipe("path_to_audio.wav")
print(result["text"])
```

> 📌 Tip: No language token is required. The model is fine-tuned for Bemba only.

---

## 🔍 Applications

* **Education**: Local language transcriptions and learning tools
* **Broadcast & Media**: Transcribe Bemba radio and TV shows
* **Research**: Bantu language documentation and analysis
* **Accessibility**: Voice-to-text systems in local apps and platforms

---

## ⚠️ Limitations & Biases

* Trained only on Bemba: does not support English or other languages.
* Accuracy may drop with heavy background noise or strong dialectal variation.
* Not optimized for code-switching or informal speech styles.

---

## 📊 Evaluation

| Language | WER (Word Error Rate) | Dataset              |
| -------- | --------------------- | -------------------- |
| Bemba    | \~16.7%               | BembaSpeech Eval Set |

---

## 🌱 Environmental Impact

* **Hardware**: A100 40GB x1
* **Training Time**: \~45 hours
* **Carbon Emissions**: Estimated \~20.4 kg CO₂
  *(via [ML CO2 Impact](https://mlco2.github.io/impact))*

---

## 📄 Citation

```bibtex
@misc{nextbembaai2025,
  title={NextInnoMind next_bemba_ai: Whisper-based ASR model for Bemba},
  author={Silas Chalwe and NextInnoMind},
  year={2025},
  howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai}},
}
```

---

## 🧑‍💻 Maintainers

* **Chalwe Silas** (Lead Developer & Dataset Curator)
* Team **NextInnoMind**

📬 Contact:

* [silaschalwe@outlook.com](mailto:silaschalwe@outlook.com)
* [mchalwesilas@gmail.com](mailto:mchalwesilas@gmail.com)

🔗 GitHub: [SilasChalwe](https://github.com/SilasChalwe)

---

## 📌 Related Resources

* [BembaSpeech Dataset](https://huggingface.co/datasets/NextInnoMind/BembaSpeech)
* [NextInnoMind on GitHub](https://github.com/SilasChalwe)

---

Fine tuned in Zambia.