MedASR-Ghana: Medical ASR for Ghanaian-Accented English

This model is a fine-tuned version of Google's MedASR optimized for Ghanaian-accented English speech recognition, particularly suited for clinical and medical transcription in Ghana.

Model Description

MedASR-Ghana is designed to transcribe English speech from speakers with Ghanaian accents, including Twi, Akan, and Fante language backgrounds. It builds on Google's MedASR foundation (a 105M parameter Conformer-based CTC model) and adapts it specifically for West African English pronunciation patterns.

Key Features

Optimized for Ghanaian accents: Trained on Twi, Akan, and Fante accented English
Medical domain ready: Inherits MedASR's medical vocabulary capabilities
Lightweight: 105M parameters - efficient for deployment
CTC-based: Simple greedy decoding, no language model required

Performance

Metric	Score
Test WER	37.53%
Validation WER	44.56%

Training Progress

The model was trained for 120 epochs, with WER improving steadily:

Epochs	Test WER
10	55.26%
40	40.21%
80	38.00%
120	37.53%

Training Data

Fine-tuned on the AfriSpeech-200 dataset, using all Ghanaian accent configurations:

Accent	Train	Validation	Test
Twi	1,315	186	58
Akan	131	-	26
Akan-Fante	230	33	32
Total	1,676	219	116

Total audio: ~5.16 hours of Ghanaian-accented English speech

Usage

Basic Usage

from transformers import AutoProcessor, AutoModelForCTC
import torch
import librosa

# Load model and processor
model_id = "samwell/medasr-ghana"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCTC.from_pretrained(model_id)

# Load and preprocess audio
audio, sr = librosa.load("your_audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Transcribe
with torch.no_grad():
    logits = model(**inputs).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)[0]

print(transcription)

With Hugging Face Pipeline

from transformers import pipeline

transcriber = pipeline(
    "automatic-speech-recognition",
    model="samwell/medasr-ghana"
)

result = transcriber("your_audio.wav")
print(result["text"])

Training Procedure

Hyperparameters

Learning rate: 3e-5
Batch size: 8 (with gradient accumulation of 4 = effective batch size 32)
Epochs: 120
Warmup steps: 300
Optimizer: AdamW
Precision: BF16
Hardware: NVIDIA L4 GPU (24GB)

Training Configuration

TrainingArguments(
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    num_train_epochs=120,
    warmup_steps=300,
    bf16=True,
    group_by_length=True,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="wer",
)

Intended Use

Primary Use Cases

Clinical transcription in Ghanaian healthcare settings
Medical dictation for doctors and nurses with Ghanaian accents
Healthcare documentation automation in Ghana
Telemedicine applications serving Ghanaian patients

Out of Scope

Non-English transcription (this model is English-only)
Accents significantly different from West African English
Real-time streaming (model is optimized for batch processing)

Limitations

Limited training data: Only ~5 hours of Ghanaian audio
WER of 37.53%: May require post-processing or language model for production use
Domain bias: Best performance on clinical/medical content
Accent coverage: Primarily Twi, Akan, and Fante - may perform differently on other Ghanaian accents

Ethical Considerations

This model should be used to assist healthcare professionals, not replace clinical judgment
Transcription errors in medical contexts can have serious consequences - always verify critical information
The model inherits biases from its training data and base model

Citation

If you use this model, please cite:

@misc{medasr-ghana,
  title={MedASR-Ghana: Medical ASR for Ghanaian-Accented English},
  author={samwell},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/samwell/medasr-ghana}
}

Related Work

@article{afrispeech2023,
  title={AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR},
  author={Olatunji, Tobi and others},
  journal={arXiv preprint arXiv:2310.00274},
  year={2023}
}

@article{medasr2024,
  title={MedASR: Medical Automatic Speech Recognition},
  author={Google Health AI},
  year={2024}
}

Model Card Contact

For questions or feedback, please open an issue on the model repository.

Downloads last month: 22

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for samwell/medasr-ghana

Base model

google/medasr

Finetuned

(4)

this model

Dataset used to train samwell/medasr-ghana

Paper for samwell/medasr-ghana

AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR

Paper • 2310.00274 • Published Sep 30, 2023 • 4