enenlhet-whisper-model
The model in this repository was developed by Samuel J. Huskey in partial fulfillment of a seed funding grant from the University of Oklahoma's Data Institute for Societal Challenges (DISC). The project's title is "AI for Cost-Effective Research Workflows When Funding is Scarce" (co-PI's: Samuel J. Huskey, Raina Heaton, and Caroline T. Schroeder). Varun Sayapaneni, Research Informatics Specialist at OU Libraries, contributed valuable insights and made important contributions to this project.
See the code repository at https://github.com/sjhuskey/enenlhet for more information.
Model description
- Developed by: Samuel J. Huskey
- Funded by: University of Oklahoma Data Institute for Societal Challenges
- Model type: Automatic Speech Recognition
- License: Apache license 2.0
- Finetuned from model: openai/whisper-small
- Dataset: sjhuskey/enenlhet-whisper-dataset
Uses
To provide first-pass transcriptions of field recordings of Enenlhet speakers.
Training and evaluation data
The training and evaluation data consist of .wav files of Enenlhet speakers and .eaf files contains transcriptions of those files.
The raw data was preprocessed with the script at https://github.com/sjhuskey/enenlhet/blob/main/python/whisper-prepare-dataset.py, which
segments the .eaf files and normalizes and segments the audio files.
The dataset was split on a 90-5-5 basis into train, test, and validation sets.
The model was trained on Colab with a version of the Jupyter notebook at https://github.com/sjhuskey/enenlhet/blob/main/python/whisper_train_colab.ipynb, which was based on the tutorial at https://huggingface.co/blog/fine-tune-whisper#prepare-environment.
Training procedure
Training hyperparameters
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
learning_rate=1.25e-5,
predict_with_generate=False,
generation_max_length=80,
warmup_steps=500,
fp16=True,
eval_strategy="epoch",
num_train_epochs=15,
save_strategy="epoch",
save_total_limit=15,
logging_steps=10,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False
Training results
| Epoch | Training Loss | Validation Loss | Wer |
|---|---|---|---|
| 1 | 3.994700 | 3.676985 | 1.591700 |
| 2 | 2.001900 | 1.887706 | 1.597055 |
| 3 | 1.541100 | 1.465359 | 1.804552 |
| 4 | 0.967300 | 1.218838 | 1.504685 |
| 5 | 0.529700 | 0.959480 | 1.105756 |
| 6 | 0.281000 | 0.960863 | 1.733601 |
| 7 | 0.122700 | 0.968978 | 1.437751 |
| 8 | 0.065600 | 0.992428 | 1.341365 |
Environmental Impacts
Output of CodeCarbon for one fine-tuning run:
timestamp: 2025-08-29T22:13:49
duration: 3332.9584745599996
emissions: 0.0696567052363746
emissions_rate: 2.089936186365788e-05
cpu_power: 42.5
gpu_power: 50.06983315793876
ram_power: 38.0
cpu_energy: 0.0390040343676826
gpu_energy: 0.074261856353882
ram_energy: 0.034693371266176
energy_consumed: 0.1479592619877408
os: Linux-6.1.123+-x86_64-with-glibc2.35
python_version: 3.12.11
codecarbon_version: 3.0.4
cpu_count: 12
cpu_model: Intel(R) Xeon(R) CPU @ 2.20GHz
gpu_count: 1
gpu_model: 1 x NVIDIA A100-SXM4-40GB
ram_total_size: 83.4760627746582
tracking_mode: machine
on_cloud: N
pue: 1.0
- Downloads last month
- -
Model tree for sjhuskey/enenlhet-whisper-model
Base model
openai/whisper-smallEvaluation results
- Wer on arrowvalidation set self-reported1.341