Kennethdot
/

kasanoma

Model card Files Files and versions

kasanoma / README.md

Kennethdot's picture

Update README.md

6f2eb55 verified 2 days ago

|

history blame contribute delete

2.96 kB

	---
	license: apache-2.0
	datasets:
	- Kennethdot/Ghana_English-Twi_Code-switching_ASR
	language:
	- en
	- tw
	base_model:
	- GiftMark/akan-whisper-model
	---
	# English–Twi Code-Switching ASR Model - Kasanoma

	## Model Overview

	This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances.

	The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.

	---

	## Base Model

	- `GiftMark/akan-whisper-model`

	---

	## Task

	- Automatic Speech Recognition (ASR)
	- Code-switching speech transcription
	- English and Twi bilingual speech recognition

	---


	## Dataset

	- `Kennethdot/Ghana_English-Twi_Code-switching_ASR`

	The dataset contains:
	- Code-switched English–Twi speech
	- Monolingual English and Twi speech
	- Read and semi-spontaneous utterances
	- Carefully transcribed bilingual speech with preserved linguistic structure

	---

	## Evaluation Setup

	Evaluation was performed using Word Error Rate (WER) without text normalization.

	This means:
	- No lowercasing
	- No punctuation removal
	- No orthographic normalization applied

	WER reflects raw transcription fidelity.

	---

	## Results

	\| Model \| CS WER \| Twi WER \| English WER \|
	\|------\|--------\|----------\|--------------\|
	\| Zero-shot Akan Whisper Small \| 127.08 \| 116.08 \| 110.26 \|
	\| Fine-tuned Model \| 6.58 \| 99.44 \| 100.43 \|

	---

	## Key Findings

	- Fine-tuning leads to a significant improvement in code-switching ASR performance
	- The model achieves strong performance on bilingual utterances after adaptation
	- Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain
	- Code-switching appears to be the most learnable and most improved component of the task

	---

	## Qualitative Examples

	The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:

	Example 1 -- Twitwa enam no into small pieces for the light soup.

	Example 2 -- just realized that w'abusua yɛ Ɔyoko, so you are royalty.

	Example 3 -- Wo nim sɛ I almost forgot to buy the food?

	---

	## Limitations

	- Model is sensitive to orthographic variation and punctuation
	- Some degradation occurs on highly monolingual segments after fine-tuning
	- Requires further balancing of training data across languages

	---

	## Intended Use

	- Code-switching ASR research
	- Low-resource African language speech recognition
	- Bilingual speech transcription systems
	- Linguistic analysis of English–Twi speech patterns

	---

	## Ethical Considerations

	- The model is intended for research and educational use only
	- It should not be used for surveillance or unauthorized speech monitoring
	- Bias may exist due to dataset imbalance between languages