MedASR-Ghana: Medical ASR for Ghanaian-Accented English

This model is a fine-tuned version of Google's MedASR optimized for Ghanaian-accented English speech recognition, particularly suited for clinical and medical transcription in Ghana.

Model Description

MedASR-Ghana is designed to transcribe English speech from speakers with Ghanaian accents, including Twi, Akan, and Fante language backgrounds. It builds on Google's MedASR foundation (a 105M parameter Conformer-based CTC model) and adapts it specifically for West African English pronunciation patterns.

Key Features

  • Optimized for Ghanaian accents: Trained on Twi, Akan, and Fante accented English
  • Medical domain ready: Inherits MedASR's medical vocabulary capabilities
  • Lightweight: 105M parameters - efficient for deployment
  • CTC-based: Simple greedy decoding, no language model required

Performance

Metric Score
Test WER 37.53%
Validation WER 44.56%

Training Progress

The model was trained for 120 epochs, with WER improving steadily:

Epochs Test WER
10 55.26%
40 40.21%
80 38.00%
120 37.53%

Training Data

Fine-tuned on the AfriSpeech-200 dataset, using all Ghanaian accent configurations:

Accent Train Validation Test
Twi 1,315 186 58
Akan 131 - 26
Akan-Fante 230 33 32
Total 1,676 219 116

Total audio: ~5.16 hours of Ghanaian-accented English speech

Usage

Basic Usage

from transformers import AutoProcessor, AutoModelForCTC
import torch
import librosa

# Load model and processor
model_id = "samwell/medasr-ghana"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCTC.from_pretrained(model_id)

# Load and preprocess audio
audio, sr = librosa.load("your_audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Transcribe
with torch.no_grad():
    logits = model(**inputs).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)[0]

print(transcription)

With Hugging Face Pipeline

from transformers import pipeline

transcriber = pipeline(
    "automatic-speech-recognition",
    model="samwell/medasr-ghana"
)

result = transcriber("your_audio.wav")
print(result["text"])

Training Procedure

Hyperparameters

  • Learning rate: 3e-5
  • Batch size: 8 (with gradient accumulation of 4 = effective batch size 32)
  • Epochs: 120
  • Warmup steps: 300
  • Optimizer: AdamW
  • Precision: BF16
  • Hardware: NVIDIA L4 GPU (24GB)

Training Configuration

TrainingArguments(
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    num_train_epochs=120,
    warmup_steps=300,
    bf16=True,
    group_by_length=True,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="wer",
)

Intended Use

Primary Use Cases

  • Clinical transcription in Ghanaian healthcare settings
  • Medical dictation for doctors and nurses with Ghanaian accents
  • Healthcare documentation automation in Ghana
  • Telemedicine applications serving Ghanaian patients

Out of Scope

  • Non-English transcription (this model is English-only)
  • Accents significantly different from West African English
  • Real-time streaming (model is optimized for batch processing)

Limitations

  • Limited training data: Only ~5 hours of Ghanaian audio
  • WER of 37.53%: May require post-processing or language model for production use
  • Domain bias: Best performance on clinical/medical content
  • Accent coverage: Primarily Twi, Akan, and Fante - may perform differently on other Ghanaian accents

Ethical Considerations

  • This model should be used to assist healthcare professionals, not replace clinical judgment
  • Transcription errors in medical contexts can have serious consequences - always verify critical information
  • The model inherits biases from its training data and base model

Citation

If you use this model, please cite:

@misc{medasr-ghana,
  title={MedASR-Ghana: Medical ASR for Ghanaian-Accented English},
  author={samwell},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/samwell/medasr-ghana}
}

Related Work

@article{afrispeech2023,
  title={AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR},
  author={Olatunji, Tobi and others},
  journal={arXiv preprint arXiv:2310.00274},
  year={2023}
}

@article{medasr2024,
  title={MedASR: Medical Automatic Speech Recognition},
  author={Google Health AI},
  year={2024}
}

Model Card Contact

For questions or feedback, please open an issue on the model repository.

Downloads last month
55
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samwell/medasr-ghana

Base model

google/medasr
Finetuned
(2)
this model

Dataset used to train samwell/medasr-ghana

Collection including samwell/medasr-ghana

Paper for samwell/medasr-ghana