Instructions to use rootabytes/Rootal-Twi-ASR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use rootabytes/Rootal-Twi-ASR with PEFT:
from peft import PeftModel from transformers import AutoModelForSeq2SeqLM base_model = AutoModelForSeq2SeqLM.from_pretrained("katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment") model = PeftModel.from_pretrained(base_model, "rootabytes/Rootal-Twi-ASR") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - ak | |
| license: cc-by-nc-4.0 | |
| base_model: katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment | |
| tags: | |
| - whisper | |
| - lora | |
| - asante-twi | |
| - akan | |
| - speech-recognition | |
| - peft | |
| datasets: | |
| - mozilla-foundation/common_voice_11_0 | |
| - michsethowusu/twi_multispeaker_audio_transcribed | |
| # Whisper Large v3 Turbo — Asante Twi LoRA Adapter (R14) | |
| Fine-tuned LoRA adapter for Asante Twi automatic speech recognition, built on top of | |
| `katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment`. | |
| **WER: 17.5%** on LVP held-out eval set (Pilot-ready threshold: <22%) | |
| ## Training Data | |
| | Dataset | Role | Notes | | |
| |---|---|---| | |
| | LVP real recordings (private) | Training + eval | Collected via Rootal Audio Annotation Platform @rootal.ai; available on request | | |
| | LVP synthetic QA (private) | Training | TTS-generated Twi Q&A pairs | | |
| | Common Voice Akan | Training | Mozilla CC0 | | |
| | Financial Inclusion Speech Dataset (Ashesi) | Training (200 samples) | See citation below | | |
| | michsethowusu/twi_multispeaker_audio_transcribed | Eval-only diagnostic | Excluded from training — transcription style mismatch | | |
| ## Training Configuration | |
| - **Base model**: `katrintomanek/whisper-large-v3-turbo_Akan_standardspeech_specaugment` | |
| - **LoRA**: rank=32, alpha=64, targets: q/k/v/out_proj + fc1/fc2 | |
| - **Language**: `None` (Twi not in Whisper vocab — no language prefix token) | |
| - **Anti-hallucination**: `condition_on_prev_tokens=False`, `repetition_penalty=1.2` | |
| - **Quantization**: 8-bit (BitsAndBytes) | |
| ## Citation | |
| If you use this adapter, please cite: | |
| ```bibtex | |
| @misc{aguyatimothy2025asantetwi, | |
| author = {Timothy Aguya, Akasiya}, | |
| title = {Whisper Large v3 Turbo — Asante Twi LoRA Adapter}, | |
| year = {2026}, | |
| publisher = {HuggingFace}, | |
| url = {https://huggingface.co/rootabytes/whisper-large-v3-turbo-asante-twi-lvp} | |
| } | |
| @misc{financialinclusion2022, | |
| author = {Asamoah Owusu, D. and Korsah, A. and Quartey, B. and Nwolley Jnr., S. | |
| and Sampah, D. and Adjepon-Yamoah, D. and Omane Boateng, L.}, | |
| title = {Financial Inclusion Speech Dataset}, | |
| year = {2022}, | |
| publisher = {Ashesi University and Nokwary Technologies}, | |
| url = {https://github.com/Ashesi-Org/Financial-Inclusion-Speech-Dataset} | |
| } | |
| @inproceedings{ardila2020common, | |
| title = {Common Voice: A Massively-Multilingual Speech Corpus}, | |
| author = {Ardila, Rosana and others}, | |
| booktitle = {LREC}, | |
| year = {2020} | |
| } | |
| @article{radford2022robust, | |
| title = {Robust Speech Recognition via Large-Scale Weak Supervision}, | |
| author = {Radford, Alec and others}, | |
| journal = {arXiv:2212.04356}, | |
| year = {2022} | |
| } | |
| @article{hu2021lora, | |
| title = {LoRA: Low-Rank Adaptation of Large Language Models}, | |
| author = {Hu, Edward J and others}, | |
| journal = {arXiv:2106.09685}, | |
| year = {2021} | |
| } |