--- library_name: transformers pipeline_tag: text-to-audio ---

# ๐Ÿซข NextInnoMind / next\_bemba\_ai **Bemba Whisper ASR (Automatic Speech Recognition)** Fine-tuned Whisper model for the Bemba language only. Developed and maintained by **NextInnoMind**, led by **Chalwe Silas**. --- ### ๐Ÿงช Model Type `WhisperForConditionalGeneration` โ€” fine-tuned using [openai/whisper-small](https://huggingface.co/openai/whisper-small) Framework: `Transformers` Checkpoint Format: `Safetensors` Languages: `Bemba` --- ## ๐Ÿ“œ Model Description This model is a Whisper Small variant fine-tuned exclusively for **Bemba**, a major Zambian language. It is designed to enhance local language ASR performance and promote indigenous language technology. --- ## ๐Ÿ“š Training Details * **Base Model**: [`openai/whisper-small`](https://huggingface.co/openai/whisper-small) * **Dataset**: * BembaSpeech (curated dataset of Bemba audio + transcripts) * **Training Time**: 8 epochs (\~45 hours on A100 GPU) * **Learning Rate**: 1e-5 * **Batch Size**: 16 * **Framework**: Transformers + Accelerate * **Tokenizer**: WhisperProcessor with `task="transcribe"` (no language token used) --- ## ๐Ÿš€ Usage ```python from transformers import pipeline pipe = pipeline( "automatic-speech-recognition", model="NextInnoMind/next_bemba_ai", chunk_length_s=30, return_timestamps=True ) # Example result = pipe("path_to_audio.wav") print(result["text"]) ``` > ๐Ÿ“Œ Tip: No language token is required. The model is fine-tuned for Bemba only. --- ## ๐Ÿ” Applications * **Education**: Local language transcriptions and learning tools * **Broadcast & Media**: Transcribe Bemba radio and TV shows * **Research**: Bantu language documentation and analysis * **Accessibility**: Voice-to-text systems in local apps and platforms --- ## โš ๏ธ Limitations & Biases * Trained only on Bemba: does not support English or other languages. * Accuracy may drop with heavy background noise or strong dialectal variation. * Not optimized for code-switching or informal speech styles. --- ## ๐Ÿ“Š Evaluation | Language | WER (Word Error Rate) | Dataset | | -------- | --------------------- | -------------------- | | Bemba | \~16.7% | BembaSpeech Eval Set | --- ## ๐ŸŒฑ Environmental Impact * **Hardware**: A100 40GB x1 * **Training Time**: \~45 hours * **Carbon Emissions**: Estimated \~20.4 kg COโ‚‚ *(via [ML CO2 Impact](https://mlco2.github.io/impact))* --- ## ๐Ÿ“„ Citation ```bibtex @misc{nextbembaai2025, title={NextInnoMind next_bemba_ai: Whisper-based ASR model for Bemba}, author={Silas Chalwe and NextInnoMind}, year={2025}, howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai}}, } ``` --- ## ๐Ÿง‘โ€๐Ÿ’ป Maintainers * **Chalwe Silas** (Lead Developer & Dataset Curator) * Team **NextInnoMind** ๐Ÿ“ฌ Contact: * [silaschalwe@outlook.com](mailto:silaschalwe@outlook.com) * [mchalwesilas@gmail.com](mailto:mchalwesilas@gmail.com) ๐Ÿ”— GitHub: [SilasChalwe](https://github.com/SilasChalwe) --- ## ๐Ÿ“Œ Related Resources * [BembaSpeech Dataset](https://huggingface.co/datasets/NextInnoMind/BembaSpeech) * [NextInnoMind on GitHub](https://github.com/SilasChalwe) --- Fine tuned in Zambia.