Model Card for whisper-medium-ht

Model Details

Model Description

This model is a fine-tuned whisper medium model using general conference talks from the Church of Jesus Christ of Latter-day Saints.

  • Developed by: Zachary Clement
  • Model type: Automatic Speech Recognition
  • Language(s): Haitian Creole
  • License: Apache 2.0
  • Finetuned from model [optional]: openai/whisper-medium

Model Sources [optional]

Uses

This model should be used to convert Haitian Creole audio to text. I did not check for catastrophic forgetting and do not recommend using on other languages.

Bias, Risks, and Limitations

There is a risk of catastrophic forgetting for languages other than English

Training Details

Training Data

This model was trained using Haitian Creole transcriptions of General Conference meetings for the Church of Jesus Christ of Latter-day Saints.

Audio from general conference talks were broken into segments, ASR was used to get a garbled transcription of audio, and an LLM was used to match the garbled ASR outputs onto the transcriptions.

In total, 9,838 training samples were used, comprising 41 hours of labeled data.

Training Procedure

Preprocessing

Audio is resampled to 16 kHz mono. Each sample is transformed into an 80-bin log-mel spectrogram using Whisper's WhisperFeatureExtractor (30-second context window).
Transcriptions are tokenized with WhisperTokenizer configured for Haitian Creole (ht) transcription. Samples whose tokenized label exceeds 448 tokens are dropped before training. No audio augmentation is applied to the phonetic candidate โ€” it trains on original phonetic-alignment segments only.

The dataset is split by talk ID: 15 specific talks are held out as a fixed evaluation set and fully excluded from all training splits (no segment from an eval talk appears in
any training candidate, including synthetic).

Training Hyperparameters

  • Training regime: bf16 mixed precision
  • Base model: openai/whisper-medium
  • Task/language: transcribe / Haitian Creole (ht)
  • Encoder: fully trainable (no freezing)
  • Max steps: 3,000
  • Per-device train batch size: 16
  • Gradient accumulation steps: 2 โ†’ effective batch size: 32
  • Per-device eval batch size: 8
  • Learning rate: 1e-5 with cosine decay
  • Warmup steps: 500
  • Gradient checkpointing: enabled
  • Eval/save strategy: every 500 steps (steps // 6), best checkpoint retained by WER
  • Save total limit: 3 checkpoints
  • Generation max length: 225 tokens
  • Optimizer: AdamW (HuggingFace Trainer default)

Speeds, Sizes, Times [optional]

  • Hardware: single NVIDIA L4 (24 GB VRAM), 16 GB RAM, 4 vCPUs
  • Training duration: 3,000 steps
  • Logging: every 25 steps (TensorBoard)

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Testing data: 15 held-out Haitian Creole conference talks, excluded from all training splits. Talk IDs: 2012_10_Nourise_a, 2021_10_Diyite_Pa_Vledi_Nou_Pafe,
2021_4_Kisa_Sove_nou_an_te_fe_pou_nou, 2021_10_Lanmou_Bondye_Sa_ki_bay_nanm_lan_plis_lajwa, 2012_10_Bon_Dezi_pou_Angaje, 2012_10_Eprev_lafwa_nou, 2015_4_Prezidan_Dieter_F_Uchtdorf, 2019_10_Bay_Espri_nou_Kontwol_Sou_Ko_nou, 2021_4_Chemen_alyans_lan, 2019_4_Jan_l_te_fe_a, 2014_4_Fotifye_nou_e_pran_kouraj,
2021_4_Bondye_Nan_Mitan_Nou, 2018_10_Vin_tounen_Sen_Denye_Jou_Egzanple, 2015_10_Elde_Quentin_L_Cook, 2012_4_Rapo_Depatman_Odit_Legliz_la_2011.

Metrics

  • WER (Word Error Rate) โ€” primary metric; used for best-checkpoint selection during training
  • CER (Character Error Rate) โ€” secondary metric

Results

The fine-tuned model showed substantial improvements over other whisper models on the holdout set.

Model WER CER
openai/whisper-large-v3 69.13% 31.30%
openai/whisper-medium 84.25% 40.89%
openai/whisper-small 97.28% 48.84%
clementzach/whisper-medium-ht 34.13% 20.30%
Downloads last month
42
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for clementzach/whisper-medium-ht

Finetuned
(865)
this model