| --- |
| license: apache-2.0 |
| datasets: |
| - Kennethdot/Ghana_English-Twi_Code-switching_ASR |
| language: |
| - en |
| - tw |
| base_model: |
| - GiftMark/akan-whisper-model |
| --- |
| # English–Twi Code-Switching ASR Model - Kasanoma |
|
|
| ## Model Overview |
|
|
| This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances. |
|
|
| The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching. |
|
|
| --- |
|
|
| ## Base Model |
|
|
| - `GiftMark/akan-whisper-model` |
|
|
| --- |
|
|
| ## Task |
|
|
| - Automatic Speech Recognition (ASR) |
| - Code-switching speech transcription |
| - English and Twi bilingual speech recognition |
|
|
| --- |
|
|
|
|
| ## Dataset |
|
|
| - `Kennethdot/Ghana_English-Twi_Code-switching_ASR` |
|
|
| The dataset contains: |
| - Code-switched English–Twi speech |
| - Monolingual English and Twi speech |
| - Read and semi-spontaneous utterances |
| - Carefully transcribed bilingual speech with preserved linguistic structure |
|
|
| --- |
|
|
| ## Evaluation Setup |
|
|
| Evaluation was performed using **Word Error Rate (WER)** without text normalization. |
|
|
| This means: |
| - No lowercasing |
| - No punctuation removal |
| - No orthographic normalization applied |
|
|
| WER reflects raw transcription fidelity. |
|
|
| --- |
|
|
| ## Results |
|
|
| | Model | CS WER | Twi WER | English WER | |
| |------|--------|----------|--------------| |
| | Zero-shot Akan Whisper Small | 127.08 | 116.08 | 110.26 | |
| | Fine-tuned Model | **6.58** | 99.44 | 100.43 | |
|
|
| --- |
|
|
| ## Key Findings |
|
|
| - Fine-tuning leads to a **significant improvement in code-switching ASR performance** |
| - The model achieves strong performance on bilingual utterances after adaptation |
| - Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain |
| - Code-switching appears to be the most learnable and most improved component of the task |
|
|
| --- |
|
|
| ## Qualitative Examples |
|
|
| The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns: |
|
|
| **Example 1** -- Twitwa enam no into small pieces for the light soup. |
|
|
| **Example 2** -- just realized that w'abusua yɛ Ɔyoko, so you are royalty. |
|
|
| **Example 3** -- Wo nim sɛ I almost forgot to buy the food? |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - Model is sensitive to orthographic variation and punctuation |
| - Some degradation occurs on highly monolingual segments after fine-tuning |
| - Requires further balancing of training data across languages |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| - Code-switching ASR research |
| - Low-resource African language speech recognition |
| - Bilingual speech transcription systems |
| - Linguistic analysis of English–Twi speech patterns |
|
|
| --- |
|
|
| ## Ethical Considerations |
|
|
| - The model is intended for research and educational use only |
| - It should not be used for surveillance or unauthorized speech monitoring |
| - Bias may exist due to dataset imbalance between languages |