PyTorch
English
Twi
whisper
kasanoma / README.md
Kennethdot's picture
Update README.md
6f2eb55 verified
metadata
license: apache-2.0
datasets:
  - Kennethdot/Ghana_English-Twi_Code-switching_ASR
language:
  - en
  - tw
base_model:
  - GiftMark/akan-whisper-model

English–Twi Code-Switching ASR Model - Kasanoma

Model Overview

This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances.

The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.


Base Model

  • GiftMark/akan-whisper-model

Task

  • Automatic Speech Recognition (ASR)
  • Code-switching speech transcription
  • English and Twi bilingual speech recognition

Dataset

  • Kennethdot/Ghana_English-Twi_Code-switching_ASR

The dataset contains:

  • Code-switched English–Twi speech
  • Monolingual English and Twi speech
  • Read and semi-spontaneous utterances
  • Carefully transcribed bilingual speech with preserved linguistic structure

Evaluation Setup

Evaluation was performed using Word Error Rate (WER) without text normalization.

This means:

  • No lowercasing
  • No punctuation removal
  • No orthographic normalization applied

WER reflects raw transcription fidelity.


Results

Model CS WER Twi WER English WER
Zero-shot Akan Whisper Small 127.08 116.08 110.26
Fine-tuned Model 6.58 99.44 100.43

Key Findings

  • Fine-tuning leads to a significant improvement in code-switching ASR performance
  • The model achieves strong performance on bilingual utterances after adaptation
  • Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain
  • Code-switching appears to be the most learnable and most improved component of the task

Qualitative Examples

The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:

Example 1 -- Twitwa enam no into small pieces for the light soup.

Example 2 -- just realized that w'abusua yɛ Ɔyoko, so you are royalty.

Example 3 -- Wo nim sɛ I almost forgot to buy the food?


Limitations

  • Model is sensitive to orthographic variation and punctuation
  • Some degradation occurs on highly monolingual segments after fine-tuning
  • Requires further balancing of training data across languages

Intended Use

  • Code-switching ASR research
  • Low-resource African language speech recognition
  • Bilingual speech transcription systems
  • Linguistic analysis of English–Twi speech patterns

Ethical Considerations

  • The model is intended for research and educational use only
  • It should not be used for surveillance or unauthorized speech monitoring
  • Bias may exist due to dataset imbalance between languages