kasanoma / README.md

Update README.md

6f2eb55 verified 2 days ago

2.96 kB

license: apache-2.0
datasets:
  - Kennethdot/Ghana_English-Twi_Code-switching_ASR
language:
  - en
  - tw
base_model:
  - GiftMark/akan-whisper-model

English–Twi Code-Switching ASR Model - Kasanoma

Model Overview

This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances.

The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.

Base Model

GiftMark/akan-whisper-model

Task

Automatic Speech Recognition (ASR)
Code-switching speech transcription
English and Twi bilingual speech recognition

Dataset

Kennethdot/Ghana_English-Twi_Code-switching_ASR

The dataset contains:

Code-switched English–Twi speech
Monolingual English and Twi speech
Read and semi-spontaneous utterances
Carefully transcribed bilingual speech with preserved linguistic structure

Evaluation Setup

Evaluation was performed using Word Error Rate (WER) without text normalization.

This means:

No lowercasing
No punctuation removal
No orthographic normalization applied

WER reflects raw transcription fidelity.

Results

Model	CS WER	Twi WER	English WER
Zero-shot Akan Whisper Small	127.08	116.08	110.26
Fine-tuned Model	6.58	99.44	100.43

Key Findings

Fine-tuning leads to a significant improvement in code-switching ASR performance
The model achieves strong performance on bilingual utterances after adaptation
Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain
Code-switching appears to be the most learnable and most improved component of the task

Qualitative Examples

The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:

Example 1 -- Twitwa enam no into small pieces for the light soup.

Example 2 -- just realized that w'abusua yɛ Ɔyoko, so you are royalty.

Example 3 -- Wo nim sɛ I almost forgot to buy the food?

Limitations

Model is sensitive to orthographic variation and punctuation
Some degradation occurs on highly monolingual segments after fine-tuning
Requires further balancing of training data across languages

Intended Use

Code-switching ASR research
Low-resource African language speech recognition
Bilingual speech transcription systems
Linguistic analysis of English–Twi speech patterns

Ethical Considerations

The model is intended for research and educational use only
It should not be used for surveillance or unauthorized speech monitoring
Bias may exist due to dataset imbalance between languages