enenlhet-whisper-model

The model in this repository was developed by Samuel J. Huskey in partial fulfillment of a seed funding grant from the University of Oklahoma's Data Institute for Societal Challenges (DISC). The project's title is "AI for Cost-Effective Research Workflows When Funding is Scarce" (co-PI's: Samuel J. Huskey, Raina Heaton, and Caroline T. Schroeder). Varun Sayapaneni, Research Informatics Specialist at OU Libraries, contributed valuable insights and made important contributions to this project.

See the code repository at https://github.com/sjhuskey/enenlhet for more information.

Model description

Uses

To provide first-pass transcriptions of field recordings of Enenlhet speakers.

Training and evaluation data

The training and evaluation data consist of .wav files of Enenlhet speakers and .eaf files contains transcriptions of those files.

The raw data was preprocessed with the script at https://github.com/sjhuskey/enenlhet/blob/main/python/whisper-prepare-dataset.py, which segments the .eaf files and normalizes and segments the audio files.

The dataset was split on a 90-5-5 basis into train, test, and validation sets.

The model was trained on Colab with a version of the Jupyter notebook at https://github.com/sjhuskey/enenlhet/blob/main/python/whisper_train_colab.ipynb, which was based on the tutorial at https://huggingface.co/blog/fine-tune-whisper#prepare-environment.

Training procedure

Training hyperparameters

per_device_train_batch_size=16,
gradient_accumulation_steps=2,
learning_rate=1.25e-5,
predict_with_generate=False,
generation_max_length=80,
warmup_steps=500,
fp16=True,
eval_strategy="epoch",
num_train_epochs=15,
save_strategy="epoch",
save_total_limit=15,
logging_steps=10,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False

Training results

Epoch Training Loss Validation Loss Wer
1 3.994700 3.676985 1.591700
2 2.001900 1.887706 1.597055
3 1.541100 1.465359 1.804552
4 0.967300 1.218838 1.504685
5 0.529700 0.959480 1.105756
6 0.281000 0.960863 1.733601
7 0.122700 0.968978 1.437751
8 0.065600 0.992428 1.341365

Environmental Impacts

Output of CodeCarbon for one fine-tuning run:

timestamp: 2025-08-29T22:13:49
duration: 3332.9584745599996
emissions: 0.0696567052363746
emissions_rate: 2.089936186365788e-05
cpu_power: 42.5
gpu_power: 50.06983315793876
ram_power: 38.0
cpu_energy: 0.0390040343676826
gpu_energy: 0.074261856353882
ram_energy: 0.034693371266176
energy_consumed: 0.1479592619877408
os: Linux-6.1.123+-x86_64-with-glibc2.35
python_version: 3.12.11
codecarbon_version: 3.0.4
cpu_count: 12
cpu_model: Intel(R) Xeon(R) CPU @ 2.20GHz
gpu_count: 1
gpu_model: 1 x NVIDIA A100-SXM4-40GB
ram_total_size: 83.4760627746582
tracking_mode: machine
on_cloud: N
pue: 1.0
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sjhuskey/enenlhet-whisper-model

Finetuned
(3171)
this model

Evaluation results