File size: 2,961 Bytes
6f2eb55 88877c8 3c187a4 88877c8 6f2eb55 88877c8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | ---
license: apache-2.0
datasets:
- Kennethdot/Ghana_English-Twi_Code-switching_ASR
language:
- en
- tw
base_model:
- GiftMark/akan-whisper-model
---
# English–Twi Code-Switching ASR Model - Kasanoma
## Model Overview
This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances.
The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.
---
## Base Model
- `GiftMark/akan-whisper-model`
---
## Task
- Automatic Speech Recognition (ASR)
- Code-switching speech transcription
- English and Twi bilingual speech recognition
---
## Dataset
- `Kennethdot/Ghana_English-Twi_Code-switching_ASR`
The dataset contains:
- Code-switched English–Twi speech
- Monolingual English and Twi speech
- Read and semi-spontaneous utterances
- Carefully transcribed bilingual speech with preserved linguistic structure
---
## Evaluation Setup
Evaluation was performed using **Word Error Rate (WER)** without text normalization.
This means:
- No lowercasing
- No punctuation removal
- No orthographic normalization applied
WER reflects raw transcription fidelity.
---
## Results
| Model | CS WER | Twi WER | English WER |
|------|--------|----------|--------------|
| Zero-shot Akan Whisper Small | 127.08 | 116.08 | 110.26 |
| Fine-tuned Model | **6.58** | 99.44 | 100.43 |
---
## Key Findings
- Fine-tuning leads to a **significant improvement in code-switching ASR performance**
- The model achieves strong performance on bilingual utterances after adaptation
- Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain
- Code-switching appears to be the most learnable and most improved component of the task
---
## Qualitative Examples
The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:
**Example 1** -- Twitwa enam no into small pieces for the light soup.
**Example 2** -- just realized that w'abusua yɛ Ɔyoko, so you are royalty.
**Example 3** -- Wo nim sɛ I almost forgot to buy the food?
---
## Limitations
- Model is sensitive to orthographic variation and punctuation
- Some degradation occurs on highly monolingual segments after fine-tuning
- Requires further balancing of training data across languages
---
## Intended Use
- Code-switching ASR research
- Low-resource African language speech recognition
- Bilingual speech transcription systems
- Linguistic analysis of English–Twi speech patterns
---
## Ethical Considerations
- The model is intended for research and educational use only
- It should not be used for surveillance or unauthorized speech monitoring
- Bias may exist due to dataset imbalance between languages |