File size: 3,672 Bytes
c29c093
 
 
89b651c
c29c093
89b651c
c29c093
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
89b651c
c29c093
 
 
 
 
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
89b651c
c29c093
 
 
 
 
 
89b651c
c29c093
 
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
 
 
 
 
 
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
89b651c
c29c093
89b651c
c29c093
 
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
89b651c
c29c093
 
89b651c
c29c093
89b651c
c29c093
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
<p align="center">
  <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="80" />
</p>

# 🫒 NextInnoMind / next\_bemba\_ai\_medium

**Multilingual Whisper ASR (Automatic Speech Recognition)**
Fine-tuned Whisper model for Bemba and English using language tokens.
Developed and maintained by **NextInnoMind**, led by **Chalwe Silas**.

---

### πŸ§ͺ Model Type

`WhisperForConditionalGeneration` β€” fine-tuned using [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
Framework: `Transformers`
Checkpoint Format: `Safetensors`
Languages: `Bemba`, `English` (with `<|bem|>` language token support)

---

## πŸ“œ Model Description

This model is a Whisper Medium variant fine-tuned for **Bemba** and **English**, enabling robust multilingual transcription. It supports the use of language tokens (e.g., `<|bem|>`) to help guide decoding, particularly for low-resource languages like Bemba.

---

## πŸ“š Training Details

* **Base Model**: [`openai/whisper-medium`](https://huggingface.co/openai/whisper-medium)
* **Dataset**:

  * BembaSpeech (curated dataset of Bemba audio + transcripts)
  * English subset of [Common Voice](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0)
* **Training Time**: 8 epochs (\~55 hours on A100 GPU)
* **Learning Rate**: 1e-5
* **Batch Size**: 16
* **Framework**: Transformers + Accelerate
* **Tokenizer**: WhisperProcessor with `language="<|bem|>"` and `task="transcribe"`

---

## πŸš€ Usage

```python
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="NextInnoMind/next_bemba_ai_medium",
    chunk_length_s=30,
    return_timestamps=True
)

# Example
result = pipe("path_to_audio.wav")
print(result["text"])
```

> πŸ“Œ Tip: For Bemba, use the language token `<|bem|>` to improve transcription accuracy.

---

## πŸ” Applications

* **Multilingual Education**: Bemba-English subtitles and transcription
* **Broadcast & Media**: Transcribe bilingual radio or TV content
* **Research**: Language preservation and Bantu-English linguistic studies
* **Voice Accessibility**: Multilingual ASR tools and captioning

---

## ⚠️ Limitations & Biases

* Slight performance drop with highly noisy or code-switched audio
* Trained on formal and clean speech; informal speech may lower accuracy
* `<|bem|>` is required for optimal Bemba decoding

---

## πŸ“Š Evaluation

| Language | WER (Word Error Rate) | Dataset              |
| -------- | --------------------- | -------------------- |
| Bemba    | \~15.2%               | BembaSpeech Eval Set |
| English  | \~10.5%               | Common Voice EN      |

---

## 🌱 Environmental Impact

* **Hardware**: A100 40GB x1
* **Training Time**: \~55 hours
* **Carbon Emissions**: Estimated \~25.8 kg COβ‚‚
  *(via [ML CO2 Impact](https://mlco2.github.io/impact))*

---

## πŸ“„ Citation

```bibtex
@misc{nextbembaai2025,
  title={NextInnoMind next_bemba_ai_medium: Multilingual Whisper ASR model for Bemba and English},
  author={Silas Chalwe and NextInnoMind},
  year={2025},
  howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai_medium}},
}
```

---

## πŸ§‘β€πŸ’» Maintainers

* **Chalwe Silas** (Lead Developer & Dataset Curator)
* Team **NextInnoMind**

πŸ“¬ Contact:

* [silaschalwe@outlook.com](mailto:silaschalwe@outlook.com)
* [mchalwesilas@gmail.com](mailto:mchalwesilas@gmail.com)

πŸ”— GitHub: [SilasChalwe](https://github.com/SilasChalwe)

---

## πŸ“Œ Related Resources

* [BembaSpeech Dataset](https://huggingface.co/datasets/NextInnoMind/BembaSpeech)
* [NextInnoMind on GitHub](https://github.com/SilasChalwe)

---

Fine tuned in Zambia.