enenlhet-wav2vec2

The model in this repository was developed by Samuel J. Huskey in partial fulfillment of a seed funding grant from the University of Oklahoma's Data Institute for Societal Challenges (DISC). The project's title is "AI for Cost-Effective Research Workflows When Funding is Scarce" (co-PI's: Samuel J. Huskey, Raina Heaton, and Caroline T. Schroeder). Varun Sayapaneni, Research Informatics Specialist at OU Libraries, contributed valuable insights and made important contributions to this project.

See the code repository at https://github.com/sjhuskey/enenlhet for more information.

Model Description

Developed by: Samuel J. Huskey
Funded by: University of Oklahoma Data Institute for Societal Challenges
Model type: Automatic Speech Recognition
License: Apache license 2.0
Finetuned from model: facebook/wav2vec2-large-xlsr-53
Dataset: sjhuskey/enenlhet-wav2vec2-dataset

Uses

To provide first-pass transcriptions of field recordings of Enenlhet speakers.

Training Details

Training Data

The training and evaluation data consist of .wav files of Enenlhet speakers and .eaf files contains transcriptions of those files.

The raw data was preprocessed with the script at https://github.com/sjhuskey/enenlhet/blob/main/python/wav2vec2-prepare-dataset.py, which segments the .eaf files and normalizes and segments the audio files.

The dataset was split on a 90-5-5 basis into train, test, and validation sets.

The model was trained on Colab with a version of the Jupyter notebook at https://github.com/sjhuskey/enenlhet/blob/main/python/wav2vec2_train_colab.ipynb, which was based on the tutorial at https://huggingface.co/blog/fine-tune-w2v2-bert.

Training Hyperparameters

group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
eval_strategy="epoch",
num_train_epochs=60,
save_strategy="epoch",
fp16=True,
logging_steps=10,
learning_rate=5e-5,
warmup_ratio=0.1,
save_total_limit=20,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False,

Training Results

Epoch	Training Loss	Validation Loss	Wer	Cer
1	9.739200	7.026410	1.000000	1.000000
2	3.333400	3.148107	1.000000	1.000000
3	2.852200	2.819278	1.000000	1.000000
4	2.849000	2.760045	1.000000	1.000000
5	2.836700	2.586708	1.000000	1.000000
6	1.626700	1.224432	1.005532	0.351696
7	1.113500	0.854202	0.961893	0.275788
8	0.998200	0.699204	0.900430	0.245716
9	0.935100	0.623386	0.849416	0.219500
10	0.837600	0.543402	0.823602	0.208105
11	0.674800	0.503909	0.779348	0.196282
12	0.609300	0.462162	0.759066	0.185829
13	0.587600	0.444128	0.755993	0.186515
14	0.635400	0.423765	0.738783	0.180346
15	0.547000	0.415437	0.726490	0.175463
16	0.499700	0.370264	0.697603	0.165353
17	0.537900	0.353593	0.685925	0.160898
18	0.503400	0.344040	0.675476	0.159527
19	0.454500	0.335758	0.659496	0.155072
20	0.473700	0.334173	0.655808	0.155329
21	0.436400	0.326168	0.647818	0.151731
22	0.402600	0.306144	0.629994	0.147104
23	0.394500	0.311927	0.618316	0.145905
24	0.413800	0.297368	0.622618	0.142392
25	0.379200	0.288900	0.607253	0.141107
26	0.343500	0.285305	0.596189	0.139051
27	0.343700	0.276769	0.591272	0.137080
28	0.337600	0.274651	0.586355	0.136223

Environmental Impact

CodeCarbon's log for one fine-tuning run:

timestamp: 2025-08-29T17:14:37
duration: 3772.417071911
emissions: 0.0891595389213348
emissions_rate: 2.3634592151861177e-05
cpu_power: 42.5
gpu_power: 56.70040165244587
ram_power: 38.0
cpu_energy: 0.0444480260015881
gpu_energy: 0.105260611708422
ram_energy: 0.039677000516805
energy_consumed: 0.1893856382268154
os: Linux-6.1.123+-x86_64-with-glibc2.35
python_version: 3.12.11
codecarbon_version: 3.0.4
cpu_count: 12
cpu_model: Intel(R) Xeon(R) CPU @ 2.20GHz
gpu_count: 1
gpu_model: 1 x NVIDIA A100-SXM4-40GB
ram_total_size: 83.4760627746582
tracking_mode: machine
on_cloud: N
pue: 1.0

Downloads last month: 3

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for sjhuskey/enenlhet-wav2vec2-model

Base model

facebook/wav2vec2-large-xlsr-53

Finetuned

(356)

this model

Dataset used to train sjhuskey/enenlhet-wav2vec2-model

Evaluation results

Wer on sjhuskey/enenlhet-wav2vec2-dataset
validation set self-reported

0.590
Cer on sjhuskey/enenlhet-wav2vec2-dataset
validation set self-reported

0.140