enenlhet-wav2vec2
The model in this repository was developed by Samuel J. Huskey in partial fulfillment of a seed funding grant from the University of Oklahoma's Data Institute for Societal Challenges (DISC). The project's title is "AI for Cost-Effective Research Workflows When Funding is Scarce" (co-PI's: Samuel J. Huskey, Raina Heaton, and Caroline T. Schroeder). Varun Sayapaneni, Research Informatics Specialist at OU Libraries, contributed valuable insights and made important contributions to this project.
See the code repository at https://github.com/sjhuskey/enenlhet for more information.
Model Description
- Developed by: Samuel J. Huskey
- Funded by: University of Oklahoma Data Institute for Societal Challenges
- Model type: Automatic Speech Recognition
- License: Apache license 2.0
- Finetuned from model: facebook/wav2vec2-large-xlsr-53
- Dataset: sjhuskey/enenlhet-wav2vec2-dataset
Uses
To provide first-pass transcriptions of field recordings of Enenlhet speakers.
Training Details
Training Data
The training and evaluation data consist of .wav files of Enenlhet speakers and .eaf files contains transcriptions of those files.
The raw data was preprocessed with the script at https://github.com/sjhuskey/enenlhet/blob/main/python/wav2vec2-prepare-dataset.py, which
segments the .eaf files and normalizes and segments the audio files.
The dataset was split on a 90-5-5 basis into train, test, and validation sets.
The model was trained on Colab with a version of the Jupyter notebook at https://github.com/sjhuskey/enenlhet/blob/main/python/wav2vec2_train_colab.ipynb, which was based on the tutorial at https://huggingface.co/blog/fine-tune-w2v2-bert.
Training Hyperparameters
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
eval_strategy="epoch",
num_train_epochs=60,
save_strategy="epoch",
fp16=True,
logging_steps=10,
learning_rate=5e-5,
warmup_ratio=0.1,
save_total_limit=20,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False,
Training Results
| Epoch | Training Loss | Validation Loss | Wer | Cer |
|---|---|---|---|---|
| 1 | 9.739200 | 7.026410 | 1.000000 | 1.000000 |
| 2 | 3.333400 | 3.148107 | 1.000000 | 1.000000 |
| 3 | 2.852200 | 2.819278 | 1.000000 | 1.000000 |
| 4 | 2.849000 | 2.760045 | 1.000000 | 1.000000 |
| 5 | 2.836700 | 2.586708 | 1.000000 | 1.000000 |
| 6 | 1.626700 | 1.224432 | 1.005532 | 0.351696 |
| 7 | 1.113500 | 0.854202 | 0.961893 | 0.275788 |
| 8 | 0.998200 | 0.699204 | 0.900430 | 0.245716 |
| 9 | 0.935100 | 0.623386 | 0.849416 | 0.219500 |
| 10 | 0.837600 | 0.543402 | 0.823602 | 0.208105 |
| 11 | 0.674800 | 0.503909 | 0.779348 | 0.196282 |
| 12 | 0.609300 | 0.462162 | 0.759066 | 0.185829 |
| 13 | 0.587600 | 0.444128 | 0.755993 | 0.186515 |
| 14 | 0.635400 | 0.423765 | 0.738783 | 0.180346 |
| 15 | 0.547000 | 0.415437 | 0.726490 | 0.175463 |
| 16 | 0.499700 | 0.370264 | 0.697603 | 0.165353 |
| 17 | 0.537900 | 0.353593 | 0.685925 | 0.160898 |
| 18 | 0.503400 | 0.344040 | 0.675476 | 0.159527 |
| 19 | 0.454500 | 0.335758 | 0.659496 | 0.155072 |
| 20 | 0.473700 | 0.334173 | 0.655808 | 0.155329 |
| 21 | 0.436400 | 0.326168 | 0.647818 | 0.151731 |
| 22 | 0.402600 | 0.306144 | 0.629994 | 0.147104 |
| 23 | 0.394500 | 0.311927 | 0.618316 | 0.145905 |
| 24 | 0.413800 | 0.297368 | 0.622618 | 0.142392 |
| 25 | 0.379200 | 0.288900 | 0.607253 | 0.141107 |
| 26 | 0.343500 | 0.285305 | 0.596189 | 0.139051 |
| 27 | 0.343700 | 0.276769 | 0.591272 | 0.137080 |
| 28 | 0.337600 | 0.274651 | 0.586355 | 0.136223 |
Environmental Impact
CodeCarbon's log for one fine-tuning run:
timestamp: 2025-08-29T17:14:37
duration: 3772.417071911
emissions: 0.0891595389213348
emissions_rate: 2.3634592151861177e-05
cpu_power: 42.5
gpu_power: 56.70040165244587
ram_power: 38.0
cpu_energy: 0.0444480260015881
gpu_energy: 0.105260611708422
ram_energy: 0.039677000516805
energy_consumed: 0.1893856382268154
os: Linux-6.1.123+-x86_64-with-glibc2.35
python_version: 3.12.11
codecarbon_version: 3.0.4
cpu_count: 12
cpu_model: Intel(R) Xeon(R) CPU @ 2.20GHz
gpu_count: 1
gpu_model: 1 x NVIDIA A100-SXM4-40GB
ram_total_size: 83.4760627746582
tracking_mode: machine
on_cloud: N
pue: 1.0
- Downloads last month
- 4
Model tree for sjhuskey/enenlhet-wav2vec2-model
Base model
facebook/wav2vec2-large-xlsr-53Dataset used to train sjhuskey/enenlhet-wav2vec2-model
Evaluation results
- Wer on sjhuskey/enenlhet-wav2vec2-datasetvalidation set self-reported0.590
- Cer on sjhuskey/enenlhet-wav2vec2-datasetvalidation set self-reported0.140