PyTorch
English
Twi
whisper
File size: 2,961 Bytes
6f2eb55
 
 
 
 
 
 
 
 
 
 
88877c8
 
 
 
 
 
 
3c187a4
88877c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f2eb55
88877c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: apache-2.0
datasets:
- Kennethdot/Ghana_English-Twi_Code-switching_ASR
language:
- en
- tw
base_model:
- GiftMark/akan-whisper-model
---
# English–Twi Code-Switching ASR Model - Kasanoma

## Model Overview

This model is a fine-tuned Automatic Speech Recognition (ASR) system designed for English–Twi code-switching speech transcription. It is built on a pretrained Akan-adapted Whisper model and further fine-tuned on a curated bilingual dataset containing English, Twi, and mixed-language utterances.

The model supports natural bilingual speech, including intra-sentential and inter-sentential code-switching.

---

## Base Model

- `GiftMark/akan-whisper-model`

---

## Task

- Automatic Speech Recognition (ASR)
- Code-switching speech transcription
- English and Twi bilingual speech recognition

---


## Dataset

- `Kennethdot/Ghana_English-Twi_Code-switching_ASR`

The dataset contains:
- Code-switched English–Twi speech
- Monolingual English and Twi speech
- Read and semi-spontaneous utterances
- Carefully transcribed bilingual speech with preserved linguistic structure

---

## Evaluation Setup

Evaluation was performed using **Word Error Rate (WER)** without text normalization.

This means:
- No lowercasing
- No punctuation removal
- No orthographic normalization applied

WER reflects raw transcription fidelity.

---

## Results

| Model | CS WER | Twi WER | English WER |
|------|--------|----------|--------------|
| Zero-shot Akan Whisper Small | 127.08 | 116.08 | 110.26 |
| Fine-tuned Model | **6.58** | 99.44 | 100.43 |

---

## Key Findings

- Fine-tuning leads to a **significant improvement in code-switching ASR performance**
- The model achieves strong performance on bilingual utterances after adaptation
- Monolingual performance remains relatively unchanged, indicating limited cross-language transfer gain
- Code-switching appears to be the most learnable and most improved component of the task

---

## Qualitative Examples

The model is capable of producing fluent bilingual outputs with preserved punctuation and natural speech patterns:

**Example 1** -- Twitwa enam no into small pieces for the light soup.

**Example 2** -- just realized that w'abusua yɛ Ɔyoko, so you are royalty.

**Example 3** -- Wo nim sɛ I almost forgot to buy the food?

---

## Limitations

- Model is sensitive to orthographic variation and punctuation
- Some degradation occurs on highly monolingual segments after fine-tuning
- Requires further balancing of training data across languages

---

## Intended Use

- Code-switching ASR research
- Low-resource African language speech recognition
- Bilingual speech transcription systems
- Linguistic analysis of English–Twi speech patterns

---

## Ethical Considerations

- The model is intended for research and educational use only
- It should not be used for surveillance or unauthorized speech monitoring
- Bias may exist due to dataset imbalance between languages