enenlhet-whisper-model

The model in this repository was developed by Samuel J. Huskey in partial fulfillment of a seed funding grant from the University of Oklahoma's Data Institute for Societal Challenges (DISC). The project's title is "AI for Cost-Effective Research Workflows When Funding is Scarce" (co-PI's: Samuel J. Huskey, Raina Heaton, and Caroline T. Schroeder). Varun Sayapaneni, Research Informatics Specialist at OU Libraries, contributed valuable insights and made important contributions to this project.

See the code repository at https://github.com/sjhuskey/enenlhet for more information.

Model description

Developed by: Samuel J. Huskey
Funded by: University of Oklahoma Data Institute for Societal Challenges
Model type: Automatic Speech Recognition
License: Apache license 2.0
Finetuned from model: openai/whisper-small
Dataset: sjhuskey/enenlhet-whisper-dataset

Uses

To provide first-pass transcriptions of field recordings of Enenlhet speakers.

Training and evaluation data

The training and evaluation data consist of .wav files of Enenlhet speakers and .eaf files contains transcriptions of those files.

The raw data was preprocessed with the script at https://github.com/sjhuskey/enenlhet/blob/main/python/whisper-prepare-dataset.py, which segments the .eaf files and normalizes and segments the audio files.

The dataset was split on a 90-5-5 basis into train, test, and validation sets.

The model was trained on Colab with a version of the Jupyter notebook at https://github.com/sjhuskey/enenlhet/blob/main/python/whisper_train_colab.ipynb, which was based on the tutorial at https://huggingface.co/blog/fine-tune-whisper#prepare-environment.

Training procedure

Training hyperparameters

per_device_train_batch_size=16,
gradient_accumulation_steps=2,
learning_rate=1.25e-5,
predict_with_generate=False,
generation_max_length=80,
warmup_steps=500,
fp16=True,
eval_strategy="epoch",
num_train_epochs=15,
save_strategy="epoch",
save_total_limit=15,
logging_steps=10,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False

Training results

Epoch	Training Loss	Validation Loss	Wer
1	3.994700	3.676985	1.591700
2	2.001900	1.887706	1.597055
3	1.541100	1.465359	1.804552
4	0.967300	1.218838	1.504685
5	0.529700	0.959480	1.105756
6	0.281000	0.960863	1.733601
7	0.122700	0.968978	1.437751
8	0.065600	0.992428	1.341365

Environmental Impacts

Output of CodeCarbon for one fine-tuning run:

timestamp: 2025-08-29T22:13:49
duration: 3332.9584745599996
emissions: 0.0696567052363746
emissions_rate: 2.089936186365788e-05
cpu_power: 42.5
gpu_power: 50.06983315793876
ram_power: 38.0
cpu_energy: 0.0390040343676826
gpu_energy: 0.074261856353882
ram_energy: 0.034693371266176
energy_consumed: 0.1479592619877408
os: Linux-6.1.123+-x86_64-with-glibc2.35
python_version: 3.12.11
codecarbon_version: 3.0.4
cpu_count: 12
cpu_model: Intel(R) Xeon(R) CPU @ 2.20GHz
gpu_count: 1
gpu_model: 1 x NVIDIA A100-SXM4-40GB
ram_total_size: 83.4760627746582
tracking_mode: machine
on_cloud: N
pue: 1.0

Downloads last month: 2

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for sjhuskey/enenlhet-whisper-model

Base model

openai/whisper-small

Finetuned

(3512)

this model

Evaluation results

Wer on arrow
validation set self-reported

1.341