File size: 2,961 Bytes

---
license: apache-2.0
datasets:
- Kennethdot/Ghana_English-Twi_Code-switching_ASR
language:
- en
- tw
base_model:
- GiftMark/akan-whisper-model
---
# English–Twi Code-Switching ASR Model - Kasanoma

## Model Overview

This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances.

The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.

---

## Base Model

- `GiftMark/akan-whisper-model`

---

## Task

- Automatic Speech Recognition (ASR)
- Code-switching speech transcription
- English and Twi bilingual speech recognition

---


## Dataset

- `Kennethdot/Ghana_English-Twi_Code-switching_ASR`

The dataset contains:
- Code-switched English–Twi speech
- Monolingual English and Twi speech
- Read and semi-spontaneous utterances
- Carefully transcribed bilingual speech with preserved linguistic structure

---

## Evaluation Setup

Evaluation was performed using **Word Error Rate (WER)** without text normalization.

This means:
- No lowercasing
- No punctuation removal
- No orthographic normalization applied

WER reflects raw transcription fidelity.

---

## Results

| Model | CS WER | Twi WER | English WER |
|------|--------|----------|--------------|
| Zero-shot Akan Whisper Small | 127.08 | 116.08 | 110.26 |
| Fine-tuned Model | **6.58** | 99.44 | 100.43 |

---

## Key Findings

- Fine-tuning leads to a **significant improvement in code-switching ASR performance**
- The model achieves strong performance on bilingual utterances after adaptation
- Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain
- Code-switching appears to be the most learnable and most improved component of the task

---

## Qualitative Examples

The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:

**Example 1** -- Twitwa enam no into small pieces for the light soup.

**Example 2** -- just realized that w'abusua yɛ Ɔyoko, so you are royalty.

**Example 3** -- Wo nim sɛ I almost forgot to buy the food?

---

## Limitations

- Model is sensitive to orthographic variation and punctuation
- Some degradation occurs on highly monolingual segments after fine-tuning
- Requires further balancing of training data across languages

---

## Intended Use

- Code-switching ASR research
- Low-resource African language speech recognition
- Bilingual speech transcription systems
- Linguistic analysis of English–Twi speech patterns

---

## Ethical Considerations

- The model is intended for research and educational use only
- It should not be used for surveillance or unauthorized speech monitoring
- Bias may exist due to dataset imbalance between languages