English–Twi Code-Switching ASR Model - Kasanoma
Model Overview
This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances.
The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.
Base Model
GiftMark/akan-whisper-model
Task
- Automatic Speech Recognition (ASR)
- Code-switching speech transcription
- English and Twi bilingual speech recognition
Dataset
Kennethdot/Ghana_English-Twi_Code-switching_ASR
The dataset contains:
- Code-switched English–Twi speech
- Monolingual English and Twi speech
- Read and semi-spontaneous utterances
- Carefully transcribed bilingual speech with preserved linguistic structure
Evaluation Setup
Evaluation was performed using Word Error Rate (WER) without text normalization.
This means:
- No lowercasing
- No punctuation removal
- No orthographic normalization applied
WER reflects raw transcription fidelity.
Results
| Model | CS WER | Twi WER | English WER |
|---|---|---|---|
| Zero-shot Akan Whisper Small | 127.08 | 116.08 | 110.26 |
| Fine-tuned Model | 6.58 | 99.44 | 100.43 |
Key Findings
- Fine-tuning leads to a significant improvement in code-switching ASR performance
- The model achieves strong performance on bilingual utterances after adaptation
- Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain
- Code-switching appears to be the most learnable and most improved component of the task
Qualitative Examples
The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:
Example 1 -- Twitwa enam no into small pieces for the light soup.
Example 2 -- just realized that w'abusua yɛ Ɔyoko, so you are royalty.
Example 3 -- Wo nim sɛ I almost forgot to buy the food?
Limitations
- Model is sensitive to orthographic variation and punctuation
- Some degradation occurs on highly monolingual segments after fine-tuning
- Requires further balancing of training data across languages
Intended Use
- Code-switching ASR research
- Low-resource African language speech recognition
- Bilingual speech transcription systems
- Linguistic analysis of English–Twi speech patterns
Ethical Considerations
- The model is intended for research and educational use only
- It should not be used for surveillance or unauthorized speech monitoring
- Bias may exist due to dataset imbalance between languages
- Downloads last month
- 36